BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients
BIMCV-COVID19+ dataset is a large dataset with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction (PCR), immunoglobulin G (IgG) and immunoglobulin M (IgM) diagnostic antibody tests and radiographic reports from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV). The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 23 images were annotated by a team of expert radiologists to include semantic segmentation of radiographic findings. Moreover, extensive information is provided,including the patient’s demographic information, type of projection and acquisition parameters for the imaging study, among others. These iterations of the database include 7377 CR, 9463 DX and 6687 CT studies.
This work is first and foremost an open and free contribution from the authors in the working group with support from the Regional Ministry of Innovation, Universities, Science and Digital Society grant awarded through decree 51/2020 by the Valencian Innovation Agency (Spain) and Regional Ministry of Health in Valencia Region. This research is also supported by the University of Alicante’s UACOVID-19-18 project.
Part of the infrastructure used has been cofunded by the European Union through the Operational Program of the European Fund of Regional Development (FEDER) of the Valencian Community 2014-2020. The Medical Image Bank of the Valencian Community was partially funded by the European Union’s Horizon 2020 Framework Programme under grant agreement 688945 (Euro-BioImaging PrepPhase II).
This work is undertaken in the context of the DeepHealth project, “Deep-Learning and HPC to Boost Biomedical Applications for Health” (https://deephealth-project.eu/) which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111”.
Once all the compressed files have been downloaded, use 00_extract_data.sh for their correct decompression. For more information, you could see the links on this page