Datasets
Open Access
UNITOPATHO
- Citation Author(s):
- Submitted by:
- Enzo Tartaglione
- Last updated:
- Tue, 05/04/2021 - 08:53
- DOI:
- 10.21227/9fsv-tm25
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
Abstract
Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images.
We introduce UniToPatho, an annotated dataset of 9536 hematoxylin and eosin stained patches extracted from 292 whole-slide images, meant for training deep neural networks for colorectal polyps classification and adenomas grading. The slides are acquired through a Hamamatsu Nanozoomer S210 scanner at 20× magnification (0.4415 μm/px). Each slide belongs to a different patient and is annotated by expert pathologists, according to six classes as follows:
-
NORM- Normal tissue;
-
HP- Hyperplastic Polyp;
-
TA.HG- Tubular Adenoma, High-Grade dysplasia;
-
TA.LG- Tubular Adenoma, Low-Grade dysplasia;
-
TVA.HG- Tubulo-Villous Adenoma, High-Grade dysplasia;
-
TVA.LG- Tubulo-Villous Adenoma, Low-Grade dysplasia.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111, DeepHealth Project.
In order to load the data, we provide below an example routine working within PyTorch frameworks. We provide two different resolutions, 800 and 7000 um/px.
Within each resolution, we provide .csv files, containing all metadata information for all the included files, comprising:
- image_id;
- label (6 classes - HP, NORM, TA.HG, TA.LG, TVA.HG, TVA.LG);
- type (4 classes - HP, NORM, HG, LG);
- reference WSI;
- reference region of interest in WSI (roi);
- resolution (micron per pixels, mpp);
- coordinates for the patch (x, y, w, h).
Below you can find the dataloader class of UNITOPatho for PyTorch. More examples can be found here.
import torch
import torchvision
import numpy as np
import cv2
import os
class UNITOPatho(torch.utils.data.Dataset):
def __init__(self, df, T, path, target, subsample=-1, gray=False, mock=False):
self.path = path
self.df = df
self.T = T
self.target = target
self.subsample = subsample
self.mock = mock
self.gray = gray
allowed_target = ['type', 'grade', 'top_label']
if target not in allowed_target:
print(f'Target must be in {allowed_target}, got {target}')
exit(1)
print(f'Loaded {len(self.df)} images')
def __len__(self):
return len(self.df)
def __getitem__(self, index):
entry = self.df.iloc[index]
image_id = entry.image_id
image_id = os.path.join(self.path, entry.top_label_name, image_id)
img = None
if self.mock:
C = 1 if self.gray else 3
img = np.random.randint(0, 255, (224, 224, C)).astype(np.uint8)
else:
img = cv2.imread(image_id)
if self.subsample != -1:
w = img.shape[0]
while w//2 > self.subsample:
img = cv2.resize(img, (w//2, w//2))
w = w//2
img = cv2.resize(img, (self.subsample, self.subsample))
if self.gray:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = np.expand_dims(img, axis=2)
else:
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
if self.T is not None:
img = self.T(img)
return img, entry[self.target]
Dataset Files
- UNITOPatho.zip (274.98 GB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.