With the motivation of no good data sources available for all diseases (from generic to chronic) and their treatment courses, a new dataset is synthesized by exploring several medical websites and resources. It provides the precaution list corresponding to over 1000+ diaganosis. prec\_t.csv : (did, diagnose, pid) = (Disease identifier, Disease name, treatment course). This dataset can be utilized for many machine learning or deep learning based healthcare applications.


The University of Turin (UniTO) released the open-access dataset Stoke collected for the homonymous Use Case 3 in the DeepHealth project (https://deephealth-project.eu/). UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP).


Visit https://github.com/EIDOSlab/UC3-UNITOBrain to have a full companion code where a U-Net model is trained over the dataset.


It contains the data of four omic profiles (CNV, mRNA, miRNA, and protein) obtained for BRCA, LGG, and LUAD obtained from the TCGA project. 

In addition, we provide synthetic data for a mixture of isotropic distributions.


Microwave-based breast cancer detection is a growing field that has been investigated as a potential novel method for breast cancer detection. Breast microwave sensing (BMS) systems use low-powered, non-ionizing microwave signals to interrogate the breast tissues. While some BMS systems have been evaluated in clinical trials, many challenges remain before these systems can be used as a viable clinical option, and breast phantoms (breast models) allow for rigorous and controlled experimental investigations.


The University of Manitoba Breast Microwave Imaging Dataset (UM-BMID) isan open-access dataset available to all researchers. The dataset containsdata from experimental scans of MRI-derived breast phantoms.The dataset itself can be found at https://bit.ly/UM-bmid. The complete documentation for the dataset is also available at this link.

A GitHub page associated with the dataset can be found here: https://github.com/UManitoba-BMS/UM-BMID.The dataset is described in an accepted manuscript:T. Reimer, J. Krenkevich, and S. Pistorius, "An open-access experimentaldataset for breast microwave imaging,", in _2020 European Conference onAntennas and Propagation (EuCAP 2020)_, Copenhagen, Denmark, Mar. 2020,pp. 1-5, doi:10.23919/EuCAP48036.2020.9135659.This GitHub repository (https://github.com/UManitoba-BMS/UM-BMID) contains the code used to produce the resultspresented in that paper and supportive scripts for the UM-BMID dataset.


Recent advances in computational power availibility and cloud computing has prompted extensive research in epileptic seizure detection and prediction. EEG (electroencephalogram) datasets from ‘Dept. of Epileptology, Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format.


Procedure :

  1. The tool used for preprocessing is Anaconda-Jupyter Notebook on Intel 8th gen i5 processor with 8GB RAM
  2. The dataset is prepared by extracting datapoints from '.edf' by using mne package in python. Equal amount of preictal and ictal data are extracted.
  3. A period of 4096 seconds (68 minutes) each of preictal and ictal data is extracted from the '.edf' files. All ictal periods for 24 patients annotated have been included in the dataset.
  4. Datapoints are loaded and preprocessed as dataframes by using pandas package in python.
  5. System RAM size should be available to the maximum possible extent as dataframes are large.
  6. The file chbmit_preprocessed_data.csv can be used as is for machine learning and deep learning models.

Data Availability :

The datset contains following files.

  • chbmit_ictal_raw_data.csv : This file contains only ictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
  • chbmit_preictal_raw_data.csv : This file contains only preictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
  • chbmit_preictal_23channels_data.csv :This file contains only preictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
  • chbmit_ictal_23channels_data.csv :This file contains only ictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
  • chbmit_preprocessed_data.csv :This file contains balanced preictal and ictal data from all 24 patients. Only 23 channels are retained, outcome column is added and amount to 24 columns in this file. In outcome column '0' indicates preictal and '1' indicates ictal.

This dataset is prepared with data reduction techniques. Data cleaning and data transformation need to be done as suitable for the application or model under development. 

Original Data:


The original raw dataset in '.edf' is available at https://physionet.org/content/chbmit/1.0.0/  and to be cited as 

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220


The MAUS dataset focused on collecting easy-acquired physiological signals under different mental demand conditions. We used the N-back task to stimuli different mental workload statuses. This dataset can help in developing a mental workload assessment system based on wearable device, especially for that PPG-based system. MAUS dataset provides ECG, Fingertip-PPG, Wrist-PPG, and GSR signal. User can make their own comparison between Fingertip-PPG and Wrist-PPG. Some study can be carried out in this dataset


The database is organized in 2 folders and documentation:
• Data – raw signal recordings for the individual participants, including extracted Inter-Beat-Interval sequence and participants’ respond in N-back task
• Subjective_rating – subjective rating of sleep quality and NASA-TLX
• MAUS_Documentation.pdf – documentation of dataset description and details.


This dataset has information of 83 patients from India. This dataset contains patients’ clinical history, histopathological features, and mammogram. The distinctive aspect of this dataset lies in its collection of mammograms that have benign tumors and used in subclassification of benign tumors. 


This datasest contains a zip folder of 80 mammograms and an excel file having mammographic features, histopathological features as well as clinical fatures of all the patients. 


Of late, efforts are underway to build computer-assisted diagnostic tools for cancer diagnosis via image processing. Such computer-assisted tools require capturing of images, stain color normalization of images, segmentation of cells of interest, and classification to count malignant versus healthy cells. This dataset is positioned towards robust segmentation of cells which is the first stage to build such a tool for plasma cell cancer, namely, Multiple Myeloma (MM), which is a type of blood cancer. The images are provided after stain color normalization.



If you use this dataset, please cite below publications-

  1. Anubha Gupta, Rahul Duggal, Shiv Gehlot, Ritu Gupta, Anvit Mangal, Lalit Kumar, Nisarg Thakkar, and Devprakash Satpathy, "GCTI-SN: Geometry-Inspired Chemical and Tissue Invariant Stain Normalization of Microscopic Medical Images," Medical Image Analysis, vol. 65, Oct 2020. DOI: https://doi.org/10.1016/j.media.2020.101788. (2020 IF: 11.148)
  2. Shiv Gehlot, Anubha Gupta and Ritu Gupta, "EDNFC-Net: Convolutional Neural Network with Nested Feature Concatenation for Nuclei-Instance Segmentation," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 1389-1393.
  3. Anubha Gupta, Pramit Mallick, Ojaswa Sharma, Ritu Gupta, and Rahul Duggal, "PCSeg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma," PLoS ONE 13(12): e0207908, Dec 2018. DOI: 10.1371/journal.pone.0207908

Mother’s Significant Feature (MSF) Dataset has been designed to provide data to researchers working towards woman and child health betterment. MSF dataset records are collected from the Mumbai metropolitan region in Maharashtra, India. Women were interviewed just after childbirth between February 2018 to March 2021. MSF comprise of 450 records with a total of 130 attributes consisting of mother’s features, father’s features and health outcomes. A detailed dataset is created to understand the mother’s features spread across three phases of her reproductive age i.e.


We have provided the copy of forms used to collect data for datset and a read me guide to undertand the features provided in dataset along with the content of all the 6 dataset submitted in excel sheet format.


This data set contains:


-88 patients


-the noncontrast computed tomography (NCCT) and computed tomography angiography (CTA) performed before thrombectomy.


-the VOI of blood clot for NCCT and CTA.


For each patient NCCT data is marked "2" and CTA is marked "1".


For each patient NCCT data is marked "2" and CTA is marked "1".