Skip to main content

Datasets

Standard Dataset

COVID-19 Posteroanterior Chest X-Ray fused (CPCXR) dataset

Citation Author(s):
Narinder Singh Punn (Research Scholar at Indian Institute of Information Technology Allahabad, India)
Sonali Agarwal (Professor at Indian Institute of Information Technology Allahabad, India)
Submitted by:
Narinder Singh Punn
Last updated:
DOI:
10.21227/x2r3-xk48
Data Format:
No Ratings Yet

Abstract

The dataset is genrated by the fusion of three publicly available datasets: COVID-19 cxr image (https://github.com/ieee8023/covid-chestxray-dataset), Radiological Society of North America (RSNA) (https://www.kaggle.com/c/rsna-pneumonia-detection-challenge), and U.S.  national  library  of  medicine  (USNLM) collected  Montgomery  country - NLM(MC) (https://lhncbc.nlm.nih.gov/publication/pub9931). These datasets were annotated by expert radiologists. The fused dataset consists of samples of diseases labeled as COVID-19, Tuberculosis, Other pneumonia (SARS, MERS, etc.), and Normal. The dataset can be utilized to train and evaulate deep learning and machine learning models as binary and multi-class classification problem.

Instructions:

The main manuscript of the proposed dataset is avaibalble at https://link.springer.com/article/10.1007%2Fs10489-020-01900-3

The dataset is already split into training, validation and test set. The labels associated with each image is presented in the dedicated *.csv files for each of the sets.

The class distribution and assigned lables in the dataset are as follows: Normal - (0,533), COVID-19 - (1,108), Other pneumonia - (2,515) and Tuberculosis - (3,58)