The MAUS dataset focused on collecting easy-acquired physiological signals under different mental demand conditions. We used the N-back task to stimuli different mental workload statuses. This dataset can help in developing a mental workload assessment system based on wearable device, especially for that PPG-based system. MAUS dataset provides ECG, Fingertip-PPG, Wrist-PPG, and GSR signal. User can make their own comparison between Fingertip-PPG and Wrist-PPG. Some study can be carried out in this dataset


The database is organized in 2 folders and documentation:
• Data – raw signal recordings for the individual participants, including extracted Inter-Beat-Interval sequence and participants’ respond in N-back task
• Subjective_rating – subjective rating of sleep quality and NASA-TLX
• MAUS_Documentation.pdf – documentation of dataset description and details.


This dataset has information of 83 patients from India. This dataset contains patients’ clinical history, histopathological features, and mammogram. The distinctive aspect of this dataset lies in its collection of mammograms that have benign tumors and used in subclassification of benign tumors. 


This datasest contains a zip folder of 80 mammograms and an excel file having mammographic features, histopathological features as well as clinical fatures of all the patients. 


Of late, efforts are underway to build computer-assisted diagnostic tools for cancer diagnosis via image processing. Such computer-assisted tools require capturing of images, stain color normalization of images, segmentation of cells of interest, and classification to count malignant versus healthy cells. This dataset is positioned towards robust segmentation of cells which is the first stage to build such a tool for plasma cell cancer, namely, Multiple Myeloma (MM), which is a type of blood cancer. The images are provided after stain color normalization.



If you use this dataset, please cite below publications-

  1. Anubha Gupta, Rahul Duggal, Shiv Gehlot, Ritu Gupta, Anvit Mangal, Lalit Kumar, Nisarg Thakkar, and Devprakash Satpathy, "GCTI-SN: Geometry-Inspired Chemical and Tissue Invariant Stain Normalization of Microscopic Medical Images," Medical Image Analysis, vol. 65, Oct 2020. DOI: (2020 IF: 11.148)
  2. Shiv Gehlot, Anubha Gupta and Ritu Gupta, "EDNFC-Net: Convolutional Neural Network with Nested Feature Concatenation for Nuclei-Instance Segmentation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 1389-1393.
  3. Anubha Gupta, Pramit Mallick, Ojaswa Sharma, Ritu Gupta, and Rahul Duggal, "PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma," PLoS ONE 13(12): e0207908, Dec 2018. DOI: 10.1371/journal.pone.0207908

Mother’s Significant Feature (MSF) Dataset has been designed to provide data to researchers working towards woman and child health betterment. MSF dataset records are collected from the Mumbai metropolitan region in Maharashtra, India. Women were interviewed just after childbirth between February 2018 to March 2021. MSF comprise of 450 records with a total of 130 attributes consisting of mother’s features, father’s features and health outcomes. A detailed dataset is created to understand the mother’s features spread across three phases of her reproductive age i.e.


We have provided the copy of forms used to collect data for datset and a read me guide to undertand the features provided in dataset along with the content of all the 6 dataset submitted in excel sheet format.


This data set contains:


-88 patients


-the noncontrast computed tomography (NCCT) and computed tomography angiography (CTA) performed before thrombectomy.


-the VOI of blood clot for NCCT and CTA.


For each patient NCCT data is marked "2" and CTA is marked "1".


For each patient NCCT data is marked "2" and CTA is marked "1".


The concept of tuberculosis detection paves a major role in this recent world because, according to the Global Tuberculosis (TB) Report in 2019, more than one million cases are reported per year in India. Even though various tests are available, the chest X-ray is the most important one, without which the detection will be incomplete. In ancient poster anterior chest radiographs, several clinical and diagnostic functions are built by the use of computationally designed algorithms.


The EEG pain dataset was collected from 15 subjects using Cold pressor test (CPT), and EEG signals were recorded using the Emotiv EPOC Flex Cap (32 channels) with 32 electrodes placed over the scalp at predetermined locations based on the International Electroencephalographic Society (10-10), with a frequency of 128 Hz.


Silk fibroin is the structural fiber of the silk filament and it is usually separated from the external fibroin by a chemical process called degumming. This process consists in an alkali bath in which the silk cocoons are boiled for a determined time. It is also known that the degumming process impacts the property of the outcoming silk fibroin fibers.


The data contained in the first sheet of the dataset is in tidy format (each row correspond to an observation) and can be directly imported in R and elaborated with the package Tidyverse. It should be noticed that the row with the standard order 49 correspond to the reference degumming while the row 50 correspond to the test made on the bare silk fiber (not degummed). In this last case neither the mass loss nor the secondary structures were determined. In fact, being not degummed the sericine was surrounding the fiber so the examination of the secondary structure could not be done. The first two column of the dataset represent the Standard order (the standard order in which the Design of Experiment data are elaborated) and the Run order (the randomized order in whcih the trials were performed). The next four columns are the Studied factors while the rest of the dataset reports the process yields (in this case, the properties of the outcoming silk fibers). 

The second sheet contains the information of the molecular weight of the tested samples. In this case only one sample for each triplicate was tested. Both the standard order and the run order referred to the same samples of the first sheet. 


Human Neck movements data acquired using Meatwear - CPRO device - Accelerometer-based Kinematic data. Data fed to OpenSim simulation software extracted Kinematics and Kinetics (Muscles, joints - Forces, Acceleration, Position)


 Histopathological characterization of colorectal polyps allows to tailor patients' management and follow up with the ultimate aim of avoiding or promptly detecting an invasive carcinoma. Colorectal polyps characterization relies on the histological analysis of tissue samples to determine the polyps malignancy and dysplasia grade. Deep neural networks achieve outstanding accuracy in medical patterns recognition, however they require large sets of annotated training images.


In order to load the data, we provide below an example routine working within PyTorch frameworks. We provide two different resolutions, 800 and 7000 um/px.

Within each resolution, we provide .csv files, containing all metadata information for all the included files, comprising:

  • image_id;
  • label (6 classes - HP, NORM, TA.HG, TA.LG, TVA.HG, TVA.LG);
  • type (4 classes - HP, NORM, HG, LG);
  • reference WSI;
  • reference region of interest in WSI (roi);
  • resolution (micron per pixels, mpp);
  • coordinates for the patch (x, y, w, h).

Below you can find the dataloader class of UNITOPatho for PyTorch. More examples can be found here.

import torch

import torchvision

import numpy as np

import cv2

import os


class UNITOPatho(

def __init__(self, df, T, path, target, subsample=-1, gray=False, mock=False):

self.path = path

self.df = df

self.T = T = target

self.subsample = subsample

self.mock = mock

self.gray = gray

allowed_target = ['type', 'grade', 'top_label']

if target not in allowed_target:

print(f'Target must be in {allowed_target}, got {target}')


print(f'Loaded {len(self.df)} images')

def __len__(self):

return len(self.df)

def __getitem__(self, index):

entry = self.df.iloc[index]

image_id = entry.image_id

image_id = os.path.join(self.path, entry.top_label_name, image_id)

img = None

if self.mock:

C = 1 if self.gray else 3

img = np.random.randint(0, 255, (224, 224, C)).astype(np.uint8)


img = cv2.imread(image_id)

if self.subsample != -1:

w = img.shape[0]

while w//2 > self.subsample:

img = cv2.resize(img, (w//2, w//2))

w = w//2

img = cv2.resize(img, (self.subsample, self.subsample))

if self.gray:

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

img = np.expand_dims(img, axis=2)


img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

if self.T is not None:

img = self.T(img)

return img, entry[]