BIMCV COVID-19+: a large annotated dataset of RX and CT images from COVID-19 patients

Citation Author(s):
Maria
de la Iglesia Vayá
Unidad Mixta de Imagen Biomédica FISABIO-CIPF.
Jose Manuel
Saborit-Torres
Unidad Mixta de Imagen Biomédica FISABIO-CIPF.
Joaquim Angel
Montell Serrano
Unidad Mixta de Imagen Biomédica FISABIO-CIPF.
Elena
Oliver-Garcia
Unidad Mixta de Imagen Biomédica FISABIO-CIPF.
Antonio
Pertusa
Universidad de Alicante, Spain.
Aurelia
Bustos
Medbravo.
Miguel
Cazorla
Hospital San Juan de Alicante, Spain.
Joaquin
Galant
Universidad de Alicante, Spain.
Xavier
Barber
Universidad Miguel Hernández, Spain.
Domingo
Orozco-Beltrán
Universidad Miguel Hernández, Spain.
Francisco
García-García
Unidad Mixta de Imagen Biomédica FISABIO-CIPF & Bioinformatics & Biostatistics Unit Principe Felipe Research Center, Valencia, Spain.
Marisa
Caparrós
Unidad Mixta de Imagen Biomédica FISABIO-CIPF.
Germán
González
Universidad de Alicante, Spain & Sierra Research SL.
Jose María
Salinas
Unidad Mixta de Imagen Biomédica FISABIO-CIPF & Hospital San Juan de Alicante, Spain.
Submitted by:
Maria De la Igl...
Last updated:
Tue, 03/16/2021 - 09:01
DOI:
10.21227/w3aw-rv39
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

BIMCV-COVID19+ dataset is a large dataset with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction (PCR), immunoglobulin G (IgG) and immunoglobulin M (IgM) diagnostic antibody tests and radiographic reports from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV). The findings are mapped onto standard Unified Medical Language System (UMLS) terminology and they cover a wide spectrum of thoracic entities, contrasting with the much more reduced number of entities annotated in previous datasets. Images are stored in high resolution and entities are localized with anatomical labels in a Medical Imaging Data Structure (MIDS) format. In addition, 23 images were annotated by a team of expert radiologists to include semantic segmentation of radiographic findings. Moreover, extensive information is provided,including the patient’s demographic information, type of projection and acquisition parameters for the imaging study, among others. These iterations of the database include 7377 CR, 9463 DX and 6687 CT studies.

 

This work is first and foremost an open and free contribution from the authors in the working group with support from the Regional Ministry of Innovation, Universities, Science and Digital Society grant awarded through decree 51/2020 by the Valencian Innovation Agency (Spain) and Regional Ministry of Health in Valencia Region. This research is also supported by the University of Alicante’s UACOVID-19-18 project.

 

Part of the infrastructure used has been cofunded by the European Union through the Operational Program of the European Fund of Regional Development (FEDER) of the Valencian Community 2014-2020. The Medical Image Bank of the Valencian Community was partially funded by the European Union’s Horizon 2020 Framework Programme under grant agreement 688945 (Euro-BioImaging PrepPhase II).

 

This work is undertaken in the context of the DeepHealth project, “Deep-Learning and HPC to Boost Biomedical Applications for Health” (https://deephealth-project.eu/) which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111”.

 

Instructions: 

Once all the compressed files have been downloaded, use 00_extract_data.sh for their correct decompression. For more information, you could see the links on this page