Abstract

The Lung Image Database Consortium image collection (LIDC-IDRI) consists of diagnostic and lung cancer screening thoracic computed tomography (CT) scans with marked-up annotated lesions. It is a web-accessible international resource for development, training, and evaluation of computer-assisted diagnostic (CAD) methods for lung cancer detection and diagnosis. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.

Seven academic centers and eight medical imaging companies collaborated to create this data set which contains 1018 cases. Each subject includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories ("nodule > or =3 mm," "nodule <3 mm," and "non-nodule > or =3 mm"). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus.

Instructions:

Context

Name: The Lung Image Database Consortium image collection (LIDC-IDRI)
Creator: K Scott Mader
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Medical Imaging

Competitions like LUNA (http://luna16.grand-challenge.org) and the Kaggle Data Science Bowl 2017 (https://www.kaggle.com/c/data-science-bowl-2017) involve processing and trying to find lesions in CT images of the lungs. In order to find disease in these images well, it is important to first find the lungs well. This dataset is a collection of 2D and 3D images with manually segmented lungs.

Challenge

Come up with an algorithm for accurately segmenting lungs and measuring important clinical parameters (lung volume, PD, etc)

Percentile Density (PD)

The PD is the density (in Hounsfield units) the given percentile of pixels fall below in the image. The table includes 5 and 95% for reference. For smokers this value is often high indicating the build up of other things in the lungs.

Comments

Supporting data_Lung Database

Submitted by tian wang on Wed, 10/27/2021 - 02:31

I am a postgraduate student researcher. Please support me by accepting my application. I need to obtain the data I worked on to discover lung cancer.

Submitted by fatoma alwerfaly on Mon, 09/16/2024 - 14:49

Good job

Submitted by Hamit AKSOY on Fri, 05/06/2022 - 08:51

I hope this message finds you well. I am currently pursuing my Master’s degree, and as part of my coursework, I have been assigned a project that aligns with Healthcare and Artificial Intelligence, specifically focusing on Biomedical Image Classification. I have come across your dataset and believe it would be incredibly useful for my research and project studies. Access to this dataset would greatly enhance my ability to conduct meaningful analyses and achieve the objectives of my project.

I would be deeply grateful if you could provide me access to this dataset. Your support would significantly contribute to my research efforts.

Thank you for your time and consideration.

Submitted by Kavya Shah on Mon, 09/23/2024 - 09:33

Dataset Files

Files have not been uploaded for this dataset

Datasets

Standard Dataset

The Lung Image Database Consortium image collection (LIDC-IDRI)