This dataset has information of 83 patients from India. This dataset contains patients’ clinical history, histopathological features, and mammogram. The distinctive aspect of this dataset lies in its collection of mammograms that have benign tumors and used in subclassification of benign tumors. 


This datasest contains a zip folder of 80 mammograms and an excel file having mammographic features, histopathological features as well as clinical fatures of all the patients. 


Of late, efforts are underway to build computer-assisted diagnostic tools for cancer diagnosis via image processing. Such computer-assisted tools require capturing of images, stain color normalization of images, segmentation of cells of interest, and classification to count malignant versus healthy cells. This dataset is positioned towards robust segmentation of cells which is the first stage to build such a tool for plasma cell cancer, namely, Multiple Myeloma (MM), which is a type of blood cancer. The images are provided after stain color normalization.



If you use this dataset, please cite below publications-

  1. Anubha Gupta, Rahul Duggal, Shiv Gehlot, Ritu Gupta, Anvit Mangal, Lalit Kumar, Nisarg Thakkar, and Devprakash Satpathy, "GCTI-SN: Geometry-Inspired Chemical and Tissue Invariant Stain Normalization of Microscopic Medical Images," Medical Image Analysis, vol. 65, Oct 2020. DOI: (2020 IF: 11.148)
  2. Shiv Gehlot, Anubha Gupta and Ritu Gupta, "EDNFC-Net: Convolutional Neural Network with Nested Feature Concatenation for Nuclei-Instance Segmentation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 1389-1393.
  3. Anubha Gupta, Pramit Mallick, Ojaswa Sharma, Ritu Gupta, and Rahul Duggal, "PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma," PLoS ONE 13(12): e0207908, Dec 2018. DOI: 10.1371/journal.pone.0207908


Simulated data: dual-polarized antenna array in GNSS


Cover letter


Restricted mean survival time (RMST), recommended for reporting survival, lacks a tool to analyze multilevel factors. Gini's mean difference of RMSTs, Δ, is proposed and applied to compare a lymph node ratio-based classification (LNRc) versus a number-based classification (ypN) in stage II/III breast cancer patients prospectively enrolled to neoadjuvant chemotherapy who underwent axillary dissection. Number of positive nodes (npos) classified patients into ypN0, npos=0, ypN1, npos=[1,3], ypN2, npos=[4,9], and ypN3, npos≥10.


Breast cancer Neoadjuvant chemotherapy

1 header row.

370 data rows

columns = characteristics, refer to papers for detailed description



 Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images.


In order to load the data, we provide below an example routine working within PyTorch frameworks. We provide two different resolutions, 800 and 7000 um/px.

Within each resolution, we provide .csv files, containing all metadata information for all the included files, comprising:

  • image_id;
  • label (6 classes - HP, NORM, TA.HG, TA.LG, TVA.HG, TVA.LG);
  • type (4 classes - HP, NORM, HG, LG);
  • reference WSI;
  • reference region of interest in WSI (roi);
  • resolution (micron per pixels, mpp);
  • coordinates for the patch (x, y, w, h).

Below you can find the dataloader class of UNITOPatho for PyTorch. More examples can be found here.

import torch

import torchvision

import numpy as np

import cv2

import os


class UNITOPatho(

def __init__(self, df, T, path, target, subsample=-1, gray=False, mock=False):

self.path = path

self.df = df

self.T = T = target

self.subsample = subsample

self.mock = mock

self.gray = gray

allowed_target = ['type', 'grade', 'top_label']

if target not in allowed_target:

print(f'Target must be in {allowed_target}, got {target}')


print(f'Loaded {len(self.df)} images')

def __len__(self):

return len(self.df)

def __getitem__(self, index):

entry = self.df.iloc[index]

image_id = entry.image_id

image_id = os.path.join(self.path, entry.top_label_name, image_id)

img = None

if self.mock:

C = 1 if self.gray else 3

img = np.random.randint(0, 255, (224, 224, C)).astype(np.uint8)


img = cv2.imread(image_id)

if self.subsample != -1:

w = img.shape[0]

while w//2 > self.subsample:

img = cv2.resize(img, (w//2, w//2))

w = w//2

img = cv2.resize(img, (self.subsample, self.subsample))

if self.gray:

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

img = np.expand_dims(img, axis=2)


img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

if self.T is not None:

img = self.T(img)

return img, entry[]


Microscopic image based analysis plays an important role in histopathological computer based diagnostics. Identification of childhood medulloblastoma and its proper subtype from biopsy tissue specimen of childhood tumor is an integral part for prognosis.The dataset is of Childhood medulloblastoma (CMB) biopsy samples. The images are of 10x and 100x microscopic magnifications, uploaded in separate folders. The images consist of normal brain tissue cell samples and CMB cell samples of different WHO defined subtypes. An excel sheet is also uploaded for ease of data description.


The dataset contains two folder of diffrent magnification images, i.e; 10x and 100x. The type of each image is described in the provided excel file. Each slide has a unique number and the number in bracket denotes that the corresponding image is of the single slide. 


Supplementary materials (Table S2).


Proteome analysis of extracellular vesicles, isolated from murine breast cancer cells or serum of healthy mice.


The migration of cancer cells is highly regulated by the biomechanical properties of their local microenvironment. Using 3D scaffolds of simple composition, several aspects of cancer cell mechanosensing (signal transduction, EMC remodeling, traction forces) have been separately analyzed in the context of cell migration. However, a combined study of these factors in 3D scaffolds that more closely resemble the complex microenvironment of the cancer ECM is still missing.


The datasets is made of a number of zip files. The name of the file identifies the figure (and figure panel) that the data refers to.


Hyperspectral (HS) imaging presents itself as a non-contact, non-ionizing and non-invasive technique, proven to be suitable for medical diagnosis. However, the volume of information contained in these images makes difficult providing the surgeon with information about the boundaries in real-time. To that end, High-Performance-Computing (HPC) platforms become necessary. This paper presents a comparison between the performances provided by five different HPC platforms while processing a spatial-spectral approach to classify HS images, assessing their main benefits and drawbacks.


Dataset description


1) Size of the images


- PD1C1: 1000 samples x 1000 lines x 100 bands

- PD1C2: 1000 samples x 1000 lines x 100 bands

- PD1C3: 1000 samples x 1000 lines x 100 bands


2) Image composition


- The information is stored band by band

- Within each band, the information is stored line by line

- The data type is float


3) Important information


This database only contains the dermatological images. The three brain images, obtained within the context of HELICoiD EU project, are already available in the following repository:


For downloading the brain images used in this research:

- PB1C1: Op12C1

- PB2C1: Op15C1

- PB3C1: Op20C1