Skin-Path

Citation Author(s):
Hongyan
Xu
University of New South Wales
Arcot
Sowmya
University of New South Wales
Dadong
Wang
CSIRO, data61
Ian
Katz
Southern Sun Pathology Pty Ltd
Submitted by:
Hongyan Xu
Last updated:
Tue, 11/19/2024 - 23:45
DOI:
10.21227/qna2-nh44
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Vision-language (VL) datasets are essential for advancing the capabilities of VL models, particularly in specialized domains like medical imaging. However, existing medical VL datasets are relatively small and predominantly focus on chest X-rays, limiting their applicability to other areas. To address this gap, we introduce the Skin-Path dataset, a comprehensive VL dataset specifically curated for histopathology. This dataset comprises 194 H&E-stained whole slide images (WSIs) from distinct patients, digitized at 20x magnification and annotated with diagnostic reports by senior pathologists. From these WSIs, we extracted 277,761 image patches, each sized 300×300 pixels, accompanied by corresponding captions. The Skin-Path dataset covers 10 distinct skin diseases, including seborrhoeic keratosis, basal cell carcinoma, and squamous cell carcinoma. Our analysis demonstrates significant diversity in the dataset, with a unique word distribution distinct from general VL datasets, as visualized through word clouds. This dataset provides a robust foundation for training and evaluating VL models tailored for histopathological applications.

Instructions: 
  1. Download: Access and extract the dataset to organize it into Images/ and Captions/ folders.
  2. Usage: Pair image patches with corresponding captions for training or evaluation tasks.
  3. Applications: Use the dataset for tasks like medical report generation or skin disease classification.
  4. Citation: Cite the dataset appropriately in any publications.