Dermoscopic Dataset for the "Dermoscopic Image Classification with Neural Style Transfer" Manuscript
The dermoscopic images considered in the paper "Dermoscopic Image Classification with Neural Style Transfer" are available for public download through the ISIC database (https://www.isic-archive.com/#!/topWithHeader/wideContentTop/main). These are 24-bit JPEG images with a typical resolution of 768 × 512 pixels. However, not all the images in the database are in satisfactory condition. Therefore, we constructed a high-quality, balanced dataset of 1000 images (500 malignant and 500 benign) by omitting the images that satisfy any of the following conditions: (a) the entirety of the tumor does not fit within the image frame, (b) an abundance of hair which blocks a significant portion of the lesion, (c) are duplicated or augmented versions of other images. This data cleaning is necessary in order to ensure accurate border detection, reliable feature extraction, a fair comparison of classification performances, and satisfactory quality control for the style-transferred images without the interference of non-lesion information. The images are resized to 224 x 224 pixels using bilinear interpolation to align with the input dimension of the VGG19 network and to reduce the computation cost of our analysis. The lesion segmentation mask for each image, trained using U-net on the PH2 dataset (https://www.fc.up.pt/addi/ph2%20database.html) is also provided. This paper also considered both the ISIC 2016 and ISIC 2017 competitions on skin lesion classification. They are both available for download at https://challenge.isic-archive.com/data.
This is the documentation for the datasets described and analyzed in the manuscript "Dermoscopic Image Classification with Neural Style Transfer" by Yutong Li, Ruoqing Zhu, Mike Yeh, and Annie Qu.
This document only discusses the constructed dataset of 1000 images. On the other hand, the ISIC 2016 and 2017 competitions can both be downloaded at "https://challenge.isic-archive.com/data". The PH2 dataset used to train the Unet to generate the segmentation mask for the constructed dataset can be downloaded from "https://www.fc.up.pt/addi/ph2%20database.html".
There are five folders in this dataset.
1. Original_Images: 1000 dermoscopic images selected from the ISIC database and resized to 224-by-224 pixels. Image artifacts such as hair and air bubbles are removed.
2. Color_Normalized_Images: The Shades of Gray algorithm is applied to the original images for color and illumination normalization. These images are regarded as the "raw images" in the manuscript.
3. Content_Image: This folder contains a single content image, which is constructed by taking the average of the 3 RGB channels across all the images in Original_Images. This is the content image used throughout the style transfer process.
4. Segmentation_Masks: This folder contains 1000 segmentation masks generated by U-net which is trained on the PH2 dataset.
5. Stylized_Images: This folder contains 1000 generated images that yielded the best classification performance using the proposed style-transfer pipeline.
Scripts to pre-process these images, conduct the proposed style-transfer algorithm, and the classification pipeline can be found at "cralo31/dermoST" (https://github.com/cralo31/dermoST) on Github (in progress).