Microwave-based breast cancer detection is a growing field that has been investigated as a potential novel method for breast cancer detection. Breast microwave sensing (BMS) systems use low-powered, non-ionizing microwave signals to interrogate the breast tissues. While some BMS systems have been evaluated in clinical trials, many challenges remain before these systems can be used as a viable clinical option, and breast phantoms (breast models) allow for rigorous and controlled experimental investigations.


The University of Manitoba Breast Microwave Imaging Dataset (UM-BMID) isan open-access dataset available to all researchers. The dataset containsdata from experimental scans of MRI-derived breast phantoms.The dataset itself can be found at https://bit.ly/UM-bmid. The complete documentation for the dataset is also available at this link.

A GitHub page associated with the dataset can be found here: https://github.com/UManitoba-BMS/UM-BMID.The dataset is described in an accepted manuscript:T. Reimer, J. Krenkevich, and S. Pistorius, "An open-access experimentaldataset for breast microwave imaging,", in _2020 European Conference onAntennas and Propagation (EuCAP 2020)_, Copenhagen, Denmark, Mar. 2020,pp. 1-5, doi:10.23919/EuCAP48036.2020.9135659.This GitHub repository (https://github.com/UManitoba-BMS/UM-BMID) contains the code used to produce the resultspresented in that paper and supportive scripts for the UM-BMID dataset.


The original datasets are NPInter4158 [1], NPInter10412 [2], RPI7317 [3], RPI2241 [4], and RPI369 [4]. Only positive samples of them were used in our work.

We used a different strategy to select more reliable negative samples rather than randomly pairing, which was originally introduced by Zhang et al. in the LPI-CNNCP [5] study.


1.Visualization of convolutional neural network layers for one participant at ROI 301 * 301

2.Convolutional neural network structure analysis in Matlab

3.Convolutional neural network Matlab code

4.Videos of brightness mode (B-mode) ultrasound images from two participants during the recorded walking trials at 5 different speeds


This dataset consists of EEG data of 40 epileptic seizure patients (both male and female) of age from 4 to 80 years. The raw data was collected from Allengers VIRGO EEG machine at Medisys Hospitals, Hyderabad, India. The EEG electrodes were placed according to 10 – 20 International standard. The EEG data was recorded from 16 channels (FP2-F4, F4-C4, C4-P4, P4-O2, FP1-F3, F3-C3, C3-P3, P3-O1, FP2-F8, F8-T4, T4-T6, T6-O2, FP1-F7, F7-T3, T3-T5, and T5-O1) at 256 samples per second.


This dataset is taken from 20 subjects over a duration of 1 hour where experiments were done on the upper body bio-impedance with the following objectives:

a)     Evaluate the effect of externally induced perturbance at the SE interface caused by motion, applied pressure, temperature variation and posture change on bio-impedance measurements.

b)     Evaluate the degree of distortion due to artefact at multiple frequencies (10kHz-100kHz) in the bio-impedance measurements.


The dataset consists of two classes: COVID-19 cases and Healthy cases 


Unzip the dataset


Recent advances in computational power availibility and cloud computing has prompted extensive research in epileptic seizure detection and prediction. EEG (electroencephalogram) datasets from ‘Dept. of Epileptology, Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format.


Procedure :

  1. The tool used for preprocessing is Anaconda-Jupyter Notebook on Intel 8th gen i5 processor with 8GB RAM
  2. The dataset is prepared by extracting datapoints from '.edf' by using mne package in python. Equal amount of preictal and ictal data are extracted.
  3. A period of 4096 seconds (68 minutes) each of preictal and ictal data is extracted from the '.edf' files. All ictal periods for 24 patients annotated have been included in the dataset.
  4. Datapoints are loaded and preprocessed as dataframes by using pandas package in python.
  5. System RAM size should be available to the maximum possible extent as dataframes are large.
  6. The file chbmit_preprocessed_data.csv can be used as is for machine learning and deep learning models.

Data Availability :

The datset contains following files.

  • chbmit_ictal_raw_data.csv : This file contains only ictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
  • chbmit_preictal_raw_data.csv : This file contains only preictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
  • chbmit_preictal_23channels_data.csv :This file contains only preictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
  • chbmit_ictal_23channels_data.csv :This file contains only ictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
  • chbmit_preprocessed_data.csv :This file contains balanced preictal and ictal data from all 24 patients. Only 23 channels are retained, outcome column is added and amount to 24 columns in this file. In outcome column '0' indicates preictal and '1' indicates ictal.
  • 24 sheets (Seizures info: patient & file number, start-stop times, datapoints)
  • File 278 files (139 preictal+ 139 ictal) ptno_fileno_seizureORnoseizure.csv(Raw data)

This dataset is prepared with data reduction techniques. Data cleaning and data transformation need to be done as suitable for the application or model under development. 

Last 2 files can be used for accessing all raw data from 24 patients.

Original Data:


The original raw dataset in '.edf' is available at https://physionet.org/content/chbmit/1.0.0/  and to be cited as 

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220



This dataset contains video sequences and stereo reconstruction results supporting the IEEE Access contribution "Stereo laryngoscopic impact site prediction for droplet-based stimulation of the laryngeal adductor reflex" (J. F. Fast et al.).

See readme file for further information.


See provided readme file for instructions.


The MAUS dataset focused on collecting easy-acquired physiological signals under different mental demand conditions. We used the N-back task to stimuli different mental workload statuses. This dataset can help in developing a mental workload assessment system based on wearable device, especially for that PPG-based system. MAUS dataset provides ECG, Fingertip-PPG, Wrist-PPG, and GSR signal. User can make their own comparison between Fingertip-PPG and Wrist-PPG. Some study can be carried out in this dataset


The database is organized in 2 folders and documentation:
• Data – raw signal recordings for the individual participants, including extracted Inter-Beat-Interval sequence and participants’ respond in N-back task
• Subjective_rating – subjective rating of sleep quality and NASA-TLX
• MAUS_Documentation.pdf – documentation of dataset description and details.


This dataset has information of 83 patients from India. This dataset contains patients’ clinical history, histopathological features, and mammogram. The distinctive aspect of this dataset lies in its collection of mammograms that have benign tumors and used in subclassification of benign tumors. 


This datasest contains a zip folder of 80 mammograms and an excel file having mammographic features, histopathological features as well as clinical fatures of all the patients.