Datasets
Standard Dataset
Skin-Path
- Citation Author(s):
- Submitted by:
- Hongyan Xu
- Last updated:
- Tue, 11/19/2024 - 23:45
- DOI:
- 10.21227/qna2-nh44
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Vision-language (VL) datasets are essential for advancing the capabilities of VL models, particularly in specialized domains like medical imaging. However, existing medical VL datasets are relatively small and predominantly focus on chest X-rays, limiting their applicability to other areas. To address this gap, we introduce the Skin-Path dataset, a comprehensive VL dataset specifically curated for histopathology. This dataset comprises 194 H&E-stained whole slide images (WSIs) from distinct patients, digitized at 20x magnification and annotated with diagnostic reports by senior pathologists. From these WSIs, we extracted 277,761 image patches, each sized 300×300 pixels, accompanied by corresponding captions. The Skin-Path dataset covers 10 distinct skin diseases, including seborrhoeic keratosis, basal cell carcinoma, and squamous cell carcinoma. Our analysis demonstrates significant diversity in the dataset, with a unique word distribution distinct from general VL datasets, as visualized through word clouds. This dataset provides a robust foundation for training and evaluating VL models tailored for histopathological applications.
- Download: Access and extract the dataset to organize it into
Images/ and
Captions/
folders. - Usage: Pair image patches with corresponding captions for training or evaluation tasks.
- Applications: Use the dataset for tasks like medical report generation or skin disease classification.
- Citation: Cite the dataset appropriately in any publications.