Datasets
Standard Dataset
Prediction of biochemical prostate cancer recurrence from any Gleason score using robust tissue structure and clinically available information
- Citation Author(s):
- Submitted by:
- Fanny Casado-Pena
- Last updated:
- Sun, 08/18/2024 - 10:56
- DOI:
- 10.21227/1g5n-vd67
- License:
- Categories:
- Keywords:
Abstract
Biopsy information and protein Prostate-Specific Antigen (PSA) levels are the most robust information available to oncologists worldwide to diagnose and decide therapies for prostate cancer patients. However, prostate cancer presents a high risk of recurrence, and the technologies used to evaluate it demand more complex resources. This paper aims to predict Biochemical Recurrence (BCR) based on Whole Slide Images (WSI) of biopsies, Gleason scores, and PSA levels. A U-net model was used to segment phenotypic features and trained on images from the Prostate Cancer Grade Assessment (PANDA) database to segment tumorous regions from pre-processed and scored WSI of biopsies. Then, the model was tested on data from publicly available repositories achieving an Intersection over Union of 87%. Tissue features, Gleason scores, and PSA levels provided high accuracy and precision in classifying patients according to their risk of presenting recurrence, for any Gleason score sampled. The trained classifier model demonstrated a 76% relative accuracy, and a precision of 69.7% for patients experiencing recurrences before 24 months. Our results provide a robust, cost-efficient approach using already available information to predict the risk of BCR.
The deep learning model U-net was trained and tested with whole slide H&E-stained prostate pathology images from two distinct sources. The training set was constructed from prostate biopsies obtained from the Prostate Cancer Grade Assessment (PANDA) Challenge (https://www.kaggle.com/competitions/prostate-cancer-grade-assessment), hosted by Kaggle Inc., and accepted for the MICCAI 2020 conference. Diagnostic slides for the testing set were sourced from The Cancer Genome Atlas (TCGA) web portal (https://www.cancer.gov/tcga). In this study, 800 WSI were employed for the training process, while 500 WSI for testing. Within the TCGA dataset, each WSI was accompanied by Gleason pattern masks, and the ground truths Gleason scores were also provided by TCGA based on patient reports. WSI images were equipped with corresponding masks graded by multiple pathologists and associated Gleason scores from the PANDA dataset. The training dataset included 200 patients graded with a Gleason score of 6, 200 with a Gleason score of 7, 150 with a Gleason score of 8, 150 with a Gleason score of 9, and 100 with a Gleason score of 10. Most of the WSI in the PANDA dataset correspond to patients with Gleason 6 and 7. However, when building the model, it was necessary to have a balanced dataset to warrant the reproducibility of the model. Therefore, we picked WSI including 200 patients graded with a Gleason score of 6, 200 with a Gleason score of 7, 150 with a Gleason score of 8, 150 with a Gleason score of 9, and 100 with a Gleason score of 10. The patients are selected randomly according to their Gleason score, where one patient coincides with one WSI. The testing dataset consisted of the 500 WSI from the TCGA web portal (Table 1). The datasets analyzed during the current study are available in the [NAME] repository, [PERSISTENT WEB LINK TO DATASETS].
In addition to these datasets, another dataset comprising 110 H&E-stained images consisting solely of nuclei, from the 2018 Data Science Bowl Competition (https://www.kaggle.com/c/data-science-bowl-2018), was used as an isolated experiment in the training process to verify epithelial cell segmentation. Furthermore, as a reference standard, ground truths were obtained from an experienced oncologist, outlining glands, lumen, and nuclei for 230 tiles varying from Gleason 6 to 10.
The subsequent BCR classification tasks were trained and tested on 150 patients only from the TCGA dataset. Amongst the 500 TCGA patients, 99 patients present BCR, from whom 71 with the selected genes. To balance the set, 79 patients without BCR were randomly chosen with similar disease-free time.