Multi-Label Retinal Disease (MuReD) Dataset

5
1 rating - Please login to submit your rating.

Abstract 

Early detection of retinal diseases is one of the most important means of preventing partial or permanent blindness in patients. One of the major stumbling blocks for manual retinal examination is the lack of a sufficient number of qualified medical personnel per capita to diagnose diseases. Computer-aided diagnosis systems (CAD) have proven to be very effective in helping physicians reduce the time taken to make a diagnosis and minimize variability in image interpretation, but they are not flexible enough to accommodate the simultaneous presence of multiple retinal diseases, which is a common situation in real-world applications. In the past years, few datasets that focus on the classification of multiple retinal pathologies present at the same time, i.e., multi-label classification have been proposed, but there are some shared problems with all of them, such as a narrow range of pathologies to classify, high level of class imbalance, low amount of samples for the underrepresented labels, no assurance in image quality, among others. All these problems hinder the performance of any model trained with these datasets, which leads to poor robustness, lack of generalization, and reduced trustability in its predictions.

To address these problems, we constructed the Multi-Label Retinal Diseases (MuReD) dataset, using images collected from three different state-of-the-art sources, i.e., ARIA, STARE, and RFMiD datasets, and performing a sequence of post-processing steps to ensure the quality of the images, a wide range of diseases to classify, and a sufficient number of samples per disease label.

The MuReD dataset consists of 2208 images with 20 different labels, with varying image quality and resolution, and at the same time, ensuring a minimal degree of quality in the data, with a sufficient number of samples per label. To the best of our knowledge, the MuReD dataset, is the only publicly available dataset that applies a sequence of post-processing steps to ensure the quality of the images, the variety of pathologies, and the number of samples per label, resulting in increased data quality and a significant reduction of the class imbalance present in the publicly available datasets.

It is envisaged that the MuReD dataset will enable the creation of more robust, general, and trustable models for the automatic detection and classification of retinal diseases.

Instructions: 

The dataset contains 2 files and a folder:

1. The file "train_data.csv" contains the name of the images that represent the training set, along with the 20 different labels.

2. The file "val_data.csv" contains the name of the images that represent the validation set, along with the 20 different labels.

3. The folder "images" contains all the images that compose the MuReD dataset, both training and validation images.

 

The images come in two different formats, i.e., .tiff and .png, since they were collected from different sources.

There is no a single image resolution. Given that the images come from different sources, resolution can vary from 520x520 to 3400x2800 depending on the source of the image.

 

All images were collected from the sources below and applied different post-processing cleaning steps:

1. ARIA dataset (http://www.damianjjfarnell.com/?page_id=276)

2. STARE dataset (https://cecas.clemson.edu/~ahoover/stare/)

3. RFMiD dataset (https://dx.doi.org/10.21227/s3g7-st65)