Recent advances in scalp electroencephalography (EEG) as a neuroimaging tool have now allowed researchers to overcome technical challenges and movement restrictions typical in traditional neuroimaging studies.  Fortunately, recent mobile EEG devices have enabled studies involving cognition and motor control in natural environments that require mobility, such as during art perception and production in a museum setting, and during locomotion tasks.


This dataset is associated with the paper, Jackson & Hall 2016, which is open source, and can be found here:

The DataPort Repository contains the data used primarily for generating Figure 1.


** Please note that this is under construction, and all data and code is still being uploaded whilst this notice is present. Thank-you. Tom **

All code is hosted as a GIT repository (below), as well as instructions, which can be found by clicking on the link/file called in that repository.

You are free to clone/pull this repository and use it under MIT license, on the understanding that any use of this code will be acknowledged by citing the original paper, DOI: 10.1109/TNSRE.2016.2612001, which is Open Access and can be found here:


The Open Big Healthy Brains (OpenBHB) dataset is a large (N>5000) multi-site 3D brain MRI dataset gathering 10 public datasets (IXI, ABIDE 1, ABIDE 2, CoRR, GSP, Localizer, MPI-Leipzig, NAR, NPC, RBP) of T1 images acquired across 93 different centers, spread worldwide (North America, Europe and China). Only healthy controls have been included in OpenBHB with age ranging from 6 to 88 years old, balanced between males and females.


Please read carrefuly the following sections.

Dataset organization

This dataset comprises 3985 images for training and 666 images for test (kept hidden for the challenge), both dedicated to the OpenBHB challenge. Additionally, 628 images are available with missing label information (age, sex, or scanner details) and they are excluded for the current challenge. The exact content of this dataset is described in our paper.

The dataset is organized as follows:

  • All meta-data information (age, sex, site, acquisition setting, magnetic field strengh, etc.) can be found in participants.tsv.
  • Corresponding T1 images pre-processed with CAT12 (VBM), FSL (SBM) and Quasi-Raw can be found in training_data.
  • The pairs (site, acquisition setting) discretized used for the OpenBHB Challenge are in official_site_class_labels.tsv.
  • Additional T1 images with missing label information are in missing_label_data.
  • The metrics used for Quality Check (e.g Euler number for FreeSurfer) can be found in qc.tsv.


  • the templates used during the VBM analysis can be found in cat12vbm_space-MNI152_desc-gm_TPM.nii.gz.
  • the templates used during the Quasi-Raw analysis can be found in quasiraw_space-MNI152_desc-brain_T1w.nii.gz.
  • the Region-Of-Interest (ROI) names corresponding to the default CAT12 atlas (Neuromorphometrics) and FSL Desikan and Destrieux atlases can be found in cat12vbm_labels.txt, freesurfer_atlas-desikan_labels.txt and freesurfer_atlas-destrieux_labels.txt respectively.
  • the surface-based feature names derived by FreeSurfer on both Desikan and Destrieux atlases are available in freesurfer_channels.txt.


If you use this dataset for your work, please use the following citation:


      title={{OpenBHB: a Large-Scale Multi-Site Brain MRI Data-set for Age Prediction and Debiasing}},

      author={Dufumier, Benoit and Grigis, Antoine and Victor, Julie and Ambroise, Corentin and Frouin, Vincent and Duchesnay, Edouard},

      journal={Under review.},



Licence and Data Usage Agreement

This dataset is under Licence CC BY-NC-SA 3.0. By downloading this dataset, you also agree to the most restrictive Data Usage Agreement (DUA) of all cohorts (see the Data Usage Agreement terms included in this dataset):

  • ABIDE 1 [1]. Licence term CC BY-NC-SA 3.0 (ShareAlike), DUA
  • ABIDE 2 [2]. Licence term CC BY-NC-SA 3.0, DUA
  • IXI [3]. Licence term CC0, DUA
  • CoRR [4] Licence term CC0, DUA
  • GSP [5]  Licence term  DUA
  • NAR [6] Licence term CC0
  • MPI-Leipzig [7] Licence term CC0
  • NPC [8] Licence term CC0
  • RBP [9,10] Licence term CC0
  • Localizer [11] Licence term CC BY 3.0


  1. [1]
  2. [2]
  3. [3]
  4. [4] Zuo, X.N., et al., An Open Science Resource for Establishing Reliability and Reproducibility in Functional Connectomics, (In Press)
  5. [5] Buckner, Randy L.; Roffman, Joshua L.; Smoller, Jordan W., 2014, "Brain Genomics Superstruct Project (GSP)",, Harvard Dataverse, V10
  6. [6] Nastase, S. A., et al., Narratives: fMRI data for evaluating models of naturalistic language comprehension.
  7. [7] Babayan, A., Erbey, M., Kumral, D. et al. A mind-brain-body dataset of MRI, EEG, cognition, emotion, and peripheral physiology in young and old adults. Sci Data 6, 180308 (2019).
  8. [8] Sunavsky, A. and Poppenk, J. (2020). Neuroimaging predictors of creativity in healthy adults. OpenNeuro. doi: 10.18112/openneuro.ds002330.v1.1.0
  9. [9] Li, P., & Clariana, R. (2019) Reading comprehension in L1 and L2: An integrative appraoch. Journal of Neurolinguistics, 50, 94-105.(
  10. [10] Follmer, J., Fang, S., Clariana, R., Meyer, B., & Li, P (2018). What predicts adult readers' understanding of STEM texts? Reading and Writing, 31, 185-214.(
  11. [11] Orfanos, D. P., Michel, V., Schwartz, Y., Pinel, P., Moreno, A., Le Bihan, D., & Frouin, V. (2017). The brainomics/localizer database. NeuroImage, 144, 309-314.

Here we present recordings from a new high-throughput instrument to optogenetically manipulate neural activity in moving


Raw Data for Liu, et al., 2021

This is the raw data corresponding to: Liu, Kumar, Sharma and Leifer, "A high-throughput method to deliver targeted optogenetic stimulation to moving C. elegans population" available at and forthcoming in PLOS Biology.

The code used to analyze this data is availabe on GitHub at


This dataset is publicly hosted on IEEE DataParts. It is >300 GB of data containing many many individual image frames. We have bundled the data into one large .tar bundle. Download the .tar bundle and extract before use. Consider using an AWS client to download the bundle instead of your web browser as we have heard of reports that download such large files over the browser can be problematic.


This dataset as-is includes only raw camera and other output of the real-time instrument used to optogenetically activate the animal and record its motion. To extract final tracks, final centerlines, final velocity etc, these raw outputs must be processed.

Post-processing can be done by running the /ProcessDateDirectory.m MATLAB script from Note post processing was optimized to run in parallel on a high performance computing cluster. It is computationally intensive and also requires an egregious amount of RAM.

Repository Directory Structure

Recordings from the instrument are organized into directories by date, which we call "Date directories."

Each experiment is it's own timestamped folder within a date directory, and it contains the following files:

  • camera_distortion.png contains camera spatial calibration information in the image metadata
  • CameraFrames.mkv is the raw camera images compressed with H.265
  • labview_parameters.csv is the settings used by the instrument in the real-time experiment
  • labview_tracks.mat contains the real-time tracking data in a MATLAB readable HDF5 format
  • projector_to_camera_distortion.png contains the spatial calibration information that maps projector pixel space into camera pixel space
  • tags.txt contains tagged information for the experiment and is used to organize and select experiments for analysis
  • timestamps.mat contains timing information saved during the real-time experiments, including closed-loop lag.
  • ConvertedProjectorFrames folder contains png compressed stimulus images converted to the camera's frame of reference.

Naming convention for individual recordings

A typical folder is 210624_RunRailsTriggeredByTurning_Sandeep_AML67_10ulRet_red

  • 20210624 - Date the dataset was collected in format YYYYMMDD.
  • RunRailsTriggeredByTurning - Experiment type describes the type of experiment. For example this experiment was performed in closed loop triggered on turning. Open loop experiments are called "RunFullWormRails" experiments for historical reasons.
  • Sandeep - Name of the experimenter
  • AML67 - C. elegans strain name. Note strain AML470 corresponds to internal strain name "AKS_483.7.e".
  • 10ulRet - Concentration of all-trans-retinal used
  • red - LED color used to stimulate. Always red for this manuscript.

Regenerating figures

Once post processing has been run, figures from the mansucript can then be generated using scripts in

Please refer to instructions_to_generate_figures.csv for instructions on which Matlab script to run to generate each specific figure.


The University of Turin (UniTO) released the open-access dataset Stoke collected for the homonymous Use Case 3 in the DeepHealth project ( UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP).


Visit to have a full companion code where a U-Net model is trained over the dataset.


Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising approximately 566 gene expression arrays, 834 copy number arrays, and 13,472 clinical phenotype data points. These data are currently housed in Georgetown University's G-DOC System and are described in a related manuscript .


This dataset consists of EEG data of 40 epileptic seizure patients (both male and female) of age from 4 to 80 years. The raw data was collected from Allengers VIRGO EEG machine at Medisys Hospitals, Hyderabad, India. The EEG electrodes were placed according to 10 – 20 International standard. The EEG data was recorded from 16 channels (FP2-F4, F4-C4, C4-P4, P4-O2, FP1-F3, F3-C3, C3-P3, P3-O1, FP2-F8, F8-T4, T4-T6, T6-O2, FP1-F7, F7-T3, T3-T5, and T5-O1) at 256 samples per second.


Recent advances in computational power availibility and cloud computing has prompted extensive research in epileptic seizure detection and prediction. EEG (electroencephalogram) datasets from ‘Dept. of Epileptology, Univ. of Bonn’ and ‘CHB-MIT Scalp EEG Database’ are publically available datasets which are the most sought after amongst researchers. Bonn dataset is very small compared to CHB-MIT. But still researchers prefer Bonn as it is in simple '.txt' format. The dataset being published here is a preprocessed form of CHB-MIT. The dataset is available in '.csv' format.


Procedure :

  1. The tool used for preprocessing is Anaconda-Jupyter Notebook on Intel 8th gen i5 processor with 8GB RAM
  2. The dataset is prepared by extracting datapoints from '.edf' by using mne package in python. Equal amount of preictal and ictal data are extracted.
  3. A period of 4096 seconds (68 minutes) each of preictal and ictal data is extracted from the '.edf' files. All ictal periods for 24 patients annotated have been included in the dataset.
  4. Datapoints are loaded and preprocessed as dataframes by using pandas package in python.
  5. System RAM size should be available to the maximum possible extent as dataframes are large.
  6. The file chbmit_preprocessed_data.csv can be used as is for machine learning and deep learning models.

Data Availability :

The datset contains following files.

  • chbmit_ictal_raw_data.csv : This file contains only ictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
  • chbmit_preictal_raw_data.csv : This file contains only preictal data from all 24 patients. The channels vary largely and amount to 96 columns in this file.
  • chbmit_preictal_23channels_data.csv :This file contains only preictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
  • chbmit_ictal_23channels_data.csv :This file contains only ictal data from all 24 patients. Only 23 channels are retained and amount to 23 columns in this file.
  • chbmit_preprocessed_data.csv :This file contains balanced preictal and ictal data from all 24 patients. Only 23 channels are retained, outcome column is added and amount to 24 columns in this file. In outcome column '0' indicates preictal and '1' indicates ictal.
  • 24 sheets (Seizures info: patient & file number, start-stop times, datapoints)
  • File 278 files (139 preictal+ 139 ictal) ptno_fileno_seizureORnoseizure.csv(Raw data)

This dataset is prepared with data reduction techniques. Data cleaning and data transformation need to be done as suitable for the application or model under development. 

Last 2 files can be used for accessing all raw data from 24 patients.

Original Data:


The original raw dataset in '.edf' is available at  and to be cited as 

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220


This data set contains:


-88 patients


-the noncontrast computed tomography (NCCT) and computed tomography angiography (CTA) performed before thrombectomy.


-the VOI of blood clot for NCCT and CTA.


For each patient NCCT data is marked "2" and CTA is marked "1".


For each patient NCCT data is marked "2" and CTA is marked "1".


Dataset asscociated with a paper in Computer Vision and Pattern Recognition (CVPR)


"Object classification from randomized EEG trials"


If you use this code or data, please cite the above paper.


See the paper "Object classification from randomized EEG trials" on IEEE Xplore.


Code for analyzing the dataset is included in the online supplementary materials for the paper.


The code from the online supplementary materials is also included here.


If you use this code or data, please cite the above paper.