UNITOPATHO

Citation Author(s):
Luca
Bertero
Medical Sciences Department, University of Turin, 10126, Torino, Italy
Carlo Alberto
Barbano
Computer Science Department, University of Turin, 10149 Torino, Italy
Daniele
Perlo
Computer Science Department, University of Turin, 10149 Torino, Italy
Enzo
Tartaglione
Computer Science Department, University of Turin, 10149 Torino, Italy
Paola
Cassoni
Medical Sciences Department, University of Turin, 10126, Torino, Italy
Marco
Grangetto
Computer Science Department, University of Turin, 10149 Torino, Italy
Attilio
Fiandrotti
Computer Science Department, University of Turin, 10149 Torino, Italy
Alessandro
Gambella
Luca
Cavallo
Submitted by:
Enzo Tartaglione
Last updated:
Tue, 05/04/2021 - 08:53
DOI:
10.21227/9fsv-tm25
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

 Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images.

We introduce UniToPatho, an annotated dataset of 9536 hematoxylin and eosin stained patches extracted from 292 whole-slide images, meant for training deep neural networks for colorectal polyps classification and adenomas grading. The slides are acquired through a Hamamatsu Nanozoomer S210 scanner at 20× magnification (0.4415 μm/px). Each slide belongs to a different patient and is annotated by expert pathologists, according to six classes as follows:

 

          • NORM- Normal tissue;
          • HP- Hyperplastic Polyp;
          • TA.HG- Tubular Adenoma, High-Grade dysplasia;
          • TA.LG- Tubular Adenoma, Low-Grade dysplasia;
          • TVA.HG- Tubulo-Villous Adenoma, High-Grade dysplasia;
          • TVA.LG- Tubulo-Villous Adenoma, Low-Grade dysplasia.

 

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825111, DeepHealth Project.

 

 

Instructions: 

In order to load the data, we provide below an example routine working within PyTorch frameworks. We provide two different resolutions, 800 and 7000 um/px.

Within each resolution, we provide .csv files, containing all metadata information for all the included files, comprising:

  • image_id;
  • label (6 classes - HP, NORM, TA.HG, TA.LG, TVA.HG, TVA.LG);
  • type (4 classes - HP, NORM, HG, LG);
  • reference WSI;
  • reference region of interest in WSI (roi);
  • resolution (micron per pixels, mpp);
  • coordinates for the patch (x, y, w, h).

Below you can find the dataloader class of UNITOPatho for PyTorch. More examples can be found here.


import torch

import torchvision

import numpy as np

import cv2

import os

 

class UNITOPatho(torch.utils.data.Dataset):

def __init__(self, df, T, path, target, subsample=-1, gray=False, mock=False):

self.path = path

self.df = df

self.T = T

self.target = target

self.subsample = subsample

self.mock = mock

self.gray = gray

allowed_target = ['type', 'grade', 'top_label']

if target not in allowed_target:

print(f'Target must be in {allowed_target}, got {target}')

exit(1)

print(f'Loaded {len(self.df)} images')
 

def __len__(self):

return len(self.df)

def __getitem__(self, index):

entry = self.df.iloc[index]

image_id = entry.image_id

image_id = os.path.join(self.path, entry.top_label_name, image_id)

img = None

if self.mock:

C = 1 if self.gray else 3

img = np.random.randint(0, 255, (224, 224, C)).astype(np.uint8)

else:

img = cv2.imread(image_id)

if self.subsample != -1:

w = img.shape[0]

while w//2 > self.subsample:

img = cv2.resize(img, (w//2, w//2))

w = w//2

img = cv2.resize(img, (self.subsample, self.subsample))

if self.gray:

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

img = np.expand_dims(img, axis=2)

else:

img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

if self.T is not None:

img = self.T(img)

return img, entry[self.target]