Skip to main content

Datasets

Open Access

UNITOPATHO

Citation Author(s):
Luca Bertero (Medical Sciences Department, University of Turin, 10126, Torino, Italy)
Carlo Alberto Barbano (Computer Science Department, University of Turin, 10149 Torino, Italy)
Daniele Perlo (Computer Science Department, University of Turin, 10149 Torino, Italy)
Enzo Tartaglione (Computer Science Department, University of Turin, 10149 Torino, Italy)
Paola Cassoni (Medical Sciences Department, University of Turin, 10126, Torino, Italy)
Marco Grangetto (Computer Science Department, University of Turin, 10149 Torino, Italy)
Attilio Fiandrotti (Computer Science Department, University of Turin, 10149 Torino, Italy)
Alessandro Gambella
Luca Cavallo
Submitted by:
Enzo Tartaglione
Last updated:
DOI:
10.21227/9fsv-tm25
Data Format:
Links:
No Ratings Yet

Abstract

 Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images.

We introduce UniToPatho, an annotated dataset of 9536 hematoxylin and eosin stained patches extracted from 292 whole-slide images, meant for training deep neural networks for colorectal polyps classification and adenomas grading. The slides are acquired through a Hamamatsu Nanozoomer S210 scanner at 20× magnification (0.4415 μm/px). Each slide belongs to a different patient and is annotated by expert pathologists, according to six classes as follows:

 

          • NORM- Normal tissue;
          • HP- Hyperplastic Polyp;
          • TA.HG- Tubular Adenoma, High-Grade dysplasia;
          • TA.LG- Tubular Adenoma, Low-Grade dysplasia;
          • TVA.HG- Tubulo-Villous Adenoma, High-Grade dysplasia;
          • TVA.LG- Tubulo-Villous Adenoma, Low-Grade dysplasia.

 


This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111, DeepHealth Project.

 

 

Instructions:

In order to load the data, we provide below an example routine working within PyTorch frameworks. We provide two different resolutions, 800 and 7000 um/px.

Within each resolution, we provide .csv files, containing all metadata information for all the included files, comprising:

  • image_id;
  • label (6 classes - HP, NORM, TA.HG, TA.LG, TVA.HG, TVA.LG);
  • type (4 classes - HP, NORM, HG, LG);
  • reference WSI;
  • reference region of interest in WSI (roi);
  • resolution (micron per pixels, mpp);
  • coordinates for the patch (x, y, w, h).

Below you can find the dataloader class of UNITOPatho for PyTorch. More examples can be found here.


import torch

import torchvision

import numpy as np

import cv2

import os

 

class UNITOPatho(torch.utils.data.Dataset):

def __init__(self, df, T, path, target, subsample=-1, gray=False, mock=False):

self.path = path

self.df = df

self.T = T

self.target = target

self.subsample = subsample

self.mock = mock

self.gray = gray

allowed_target = ['type', 'grade', 'top_label']

if target not in allowed_target:

print(f'Target must be in {allowed_target}, got {target}')

exit(1)

print(f'Loaded {len(self.df)} images')
 

def __len__(self):

return len(self.df)

def __getitem__(self, index):

entry = self.df.iloc[index]

image_id = entry.image_id

image_id = os.path.join(self.path, entry.top_label_name, image_id)

img = None

if self.mock:

C = 1 if self.gray else 3

img = np.random.randint(0, 255, (224, 224, C)).astype(np.uint8)

else:

img = cv2.imread(image_id)

if self.subsample != -1:

w = img.shape[0]

while w//2 > self.subsample:

img = cv2.resize(img, (w//2, w//2))

w = w//2

img = cv2.resize(img, (self.subsample, self.subsample))

if self.gray:

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

img = np.expand_dims(img, axis=2)

else:

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

if self.T is not None:

img = self.T(img)

return img, entry[self.target]

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.