Recent advances in scalp electroencephalography (EEG) as a neuroimaging tool have now allowed researchers to overcome technical challenges and movement restrictions typical in traditional neuroimaging studies.  Fortunately, recent mobile EEG devices have enabled studies involving cognition and motor control in natural environments that require mobility, such as during art perception and production in a museum setting, and during locomotion tasks.


This dataset is associated with the paper, Jackson & Hall 2016, which is open source, and can be found here:

The DataPort Repository contains the data used primarily for generating Figure 1.


** Please note that this is under construction, and all data and code is still being uploaded whilst this notice is present. Thank-you. Tom **

All code is hosted as a GIT repository (below), as well as instructions, which can be found by clicking on the link/file called in that repository.

You are free to clone/pull this repository and use it under MIT license, on the understanding that any use of this code will be acknowledged by citing the original paper, DOI: 10.1109/TNSRE.2016.2612001, which is Open Access and can be found here:


The use of modern Mobile Brain-Body imaging techniques, combined with hyperscanning (simultaneous and synchronous recording of brain activity of multiple participants) has allowed researchers to explore a broad range of different types of social interactions from the neuroengineering perspective. In specific, this approach allows to study such type of interactions under an ecologically valid approach.


It mainly includes the original BONN epilepsy EEG data set, as well as the EEG signals after the revised tunable Q-factor wavelet transform decomposition and reconstruction of the data set.


EEG consists of collecting information from brain activity in the form of electrical voltage. Epileptic Seizure prediction and detection is a major sought after research nowadays. This dataset contains data from 11 patients of whom seizures are observed in EEG for 2 patients.


The total duration of seizures is 170 seconds. The number of channels is 16 and data is collected at 256Hz sampling rate.


The final dataset files in .csv format contain 87040 rows x 17 columns,



The Open Big Healthy Brains (OpenBHB) dataset is a large (N>5000) multi-site 3D brain MRI dataset gathering 10 public datasets (IXI, ABIDE 1, ABIDE 2, CoRR, GSP, Localizer, MPI-Leipzig, NAR, NPC, RBP) of T1 images acquired across 93 different centers, spread worldwide (North America, Europe and China). Only healthy controls have been included in OpenBHB with age ranging from 6 to 88 years old, balanced between males and females.


Please read carrefuly the following sections.

Dataset organization

This dataset comprises 3985 images for training and 666 images for test (kept hidden for the challenge), both dedicated to the OpenBHB challenge. Additionally, 628 images are available with missing label information (age, sex, or scanner details) and they are excluded for the current challenge. The exact content of this dataset is described in our paper.

The dataset is organized as follows:

  • All meta-data information (age, sex, site, acquisition setting, magnetic field strengh, etc.) can be found in participants.tsv.
  • Corresponding T1 images pre-processed with CAT12 (VBM), FSL (SBM) and Quasi-Raw can be found in training_data.
  • The pairs (site, acquisition setting) discretized used for the OpenBHB Challenge are in official_site_class_labels.tsv.
  • Additional T1 images with missing label information are in missing_label_data.
  • The metrics used for Quality Check (e.g Euler number for FreeSurfer) can be found in qc.tsv.


  • the templates used during the VBM analysis can be found in cat12vbm_space-MNI152_desc-gm_TPM.nii.gz.
  • the templates used during the Quasi-Raw analysis can be found in quasiraw_space-MNI152_desc-brain_T1w.nii.gz.
  • the Region-Of-Interest (ROI) names corresponding to the default CAT12 atlas (Neuromorphometrics) and FSL Desikan and Destrieux atlases can be found in cat12vbm_labels.txt, freesurfer_atlas-desikan_labels.txt and freesurfer_atlas-destrieux_labels.txt respectively.
  • the surface-based feature names derived by FreeSurfer on both Desikan and Destrieux atlases are available in freesurfer_channels.txt.


If you use this dataset for your work, please use the following citation:


      title={{OpenBHB: a Large-Scale Multi-Site Brain MRI Data-set for Age Prediction and Debiasing}},

      author={Dufumier, Benoit and Grigis, Antoine and Victor, Julie and Ambroise, Corentin and Frouin, Vincent and Duchesnay, Edouard},

      journal={Under review.},



Licence and Data Usage Agreement

This dataset is under Licence CC BY-NC-SA 3.0. By downloading this dataset, you also agree to the most restrictive Data Usage Agreement (DUA) of all cohorts (see the Data Usage Agreement terms included in this dataset):

  • ABIDE 1 [1]. Licence term CC BY-NC-SA 3.0 (ShareAlike), DUA
  • ABIDE 2 [2]. Licence term CC BY-NC-SA 3.0, DUA
  • IXI [3]. Licence term CC0, DUA
  • CoRR [4] Licence term CC0, DUA
  • GSP [5]  Licence term  DUA
  • NAR [6] Licence term CC0
  • MPI-Leipzig [7] Licence term CC0
  • NPC [8] Licence term CC0
  • RBP [9,10] Licence term CC0
  • Localizer [11] Licence term CC BY 3.0


  1. [1]
  2. [2]
  3. [3]
  4. [4] Zuo, X.N., et al., An Open Science Resource for Establishing Reliability and Reproducibility in Functional Connectomics, (In Press)
  5. [5] Buckner, Randy L.; Roffman, Joshua L.; Smoller, Jordan W., 2014, "Brain Genomics Superstruct Project (GSP)",, Harvard Dataverse, V10
  6. [6] Nastase, S. A., et al., Narratives: fMRI data for evaluating models of naturalistic language comprehension.
  7. [7] Babayan, A., Erbey, M., Kumral, D. et al. A mind-brain-body dataset of MRI, EEG, cognition, emotion, and peripheral physiology in young and old adults. Sci Data 6, 180308 (2019).
  8. [8] Sunavsky, A. and Poppenk, J. (2020). Neuroimaging predictors of creativity in healthy adults. OpenNeuro. doi: 10.18112/openneuro.ds002330.v1.1.0
  9. [9] Li, P., & Clariana, R. (2019) Reading comprehension in L1 and L2: An integrative appraoch. Journal of Neurolinguistics, 50, 94-105.(
  10. [10] Follmer, J., Fang, S., Clariana, R., Meyer, B., & Li, P (2018). What predicts adult readers' understanding of STEM texts? Reading and Writing, 31, 185-214.(
  11. [11] Orfanos, D. P., Michel, V., Schwartz, Y., Pinel, P., Moreno, A., Le Bihan, D., & Frouin, V. (2017). The brainomics/localizer database. NeuroImage, 144, 309-314.

Please cite the following paper when using this dataset:

N. Thakur, “Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions,” Preprints, 2022, DOI: 10.20944/preprints202206.0383.v1



This dataset contains about 140,000 Tweets related to exoskeletons. that were mined for a period of 5-years from May 21, 2017, to May 21, 2022. The tweets contain diverse forms of communications and conversations which communicate user interests, user perspectives, public opinion, reviews, feedback, suggestions, etc., related to exoskeletons.


The dataset contains only tweet identifiers (Tweet IDs) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used. The process of retrieving a tweet's complete information (such as the text of the tweet, username, user ID, date and time, etc.) using its ID is known as the hydration of a tweet ID. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.


Data Description

This dataset consists of 7 .txt files. The following shows the number of Tweet IDs and the date range (of the associated tweets) in each of these files. 

Filename: Exoskeleton_TweetIDs_Set1.txt

Number of Tweet IDs – 22945, Date Range of Tweets - July 20, 2021 – May 21, 2022

Filename: Exoskeleton_TweetIDs_Set2.txt

Number of Tweet IDs – 19416, Date Range of Tweets - Dec 1, 2020 – July 19, 2021

Filename: Exoskeleton_TweetIDs_Set3.txt

Number of Tweet IDs – 16673, Date Range of Tweets - April 29, 2020 - Nov 30, 2020

Filename: Exoskeleton_TweetIDs_Set4.txt

Number of Tweet IDs – 16208, Date Range of Tweets - Oct 5, 2019 - Apr 28, 2020

Filename: Exoskeleton_TweetIDs_Set5.txt

Number of Tweet IDs – 17983, Date Range of Tweets - Feb 13, 2019 - Oct 4, 2019

Filename: Exoskeleton_TweetIDs_Set6.txt

Number of Tweet IDs – 34009, Date Range of Tweets - Nov 9, 2017 - Feb 12, 2019

Filename: Exoskeleton_TweetIDs_Set7.txt

Number of Tweet IDs – 11351, Date Range of Tweets - May 21, 2017 - Nov 8, 2017


For any questions related to the dataset, please contact Nirmalya Thakur at


Here we present recordings from a new high-throughput instrument to optogenetically manipulate neural activity in moving


Raw Data for Liu, et al., 2021

This is the raw data corresponding to: Liu, Kumar, Sharma and Leifer, "A high-throughput method to deliver targeted optogenetic stimulation to moving C. elegans population" available at and forthcoming in PLOS Biology.

The code used to analyze this data is availabe on GitHub at


This dataset is publicly hosted on IEEE DataParts. It is >300 GB of data containing many many individual image frames. We have bundled the data into one large .tar bundle. Download the .tar bundle and extract before use. Consider using an AWS client to download the bundle instead of your web browser as we have heard of reports that download such large files over the browser can be problematic.


This dataset as-is includes only raw camera and other output of the real-time instrument used to optogenetically activate the animal and record its motion. To extract final tracks, final centerlines, final velocity etc, these raw outputs must be processed.

Post-processing can be done by running the /ProcessDateDirectory.m MATLAB script from Note post processing was optimized to run in parallel on a high performance computing cluster. It is computationally intensive and also requires an egregious amount of RAM.

Repository Directory Structure

Recordings from the instrument are organized into directories by date, which we call "Date directories."

Each experiment is it's own timestamped folder within a date directory, and it contains the following files:

  • camera_distortion.png contains camera spatial calibration information in the image metadata
  • CameraFrames.mkv is the raw camera images compressed with H.265
  • labview_parameters.csv is the settings used by the instrument in the real-time experiment
  • labview_tracks.mat contains the real-time tracking data in a MATLAB readable HDF5 format
  • projector_to_camera_distortion.png contains the spatial calibration information that maps projector pixel space into camera pixel space
  • tags.txt contains tagged information for the experiment and is used to organize and select experiments for analysis
  • timestamps.mat contains timing information saved during the real-time experiments, including closed-loop lag.
  • ConvertedProjectorFrames folder contains png compressed stimulus images converted to the camera's frame of reference.

Naming convention for individual recordings

A typical folder is 210624_RunRailsTriggeredByTurning_Sandeep_AML67_10ulRet_red

  • 20210624 - Date the dataset was collected in format YYYYMMDD.
  • RunRailsTriggeredByTurning - Experiment type describes the type of experiment. For example this experiment was performed in closed loop triggered on turning. Open loop experiments are called "RunFullWormRails" experiments for historical reasons.
  • Sandeep - Name of the experimenter
  • AML67 - C. elegans strain name. Note strain AML470 corresponds to internal strain name "AKS_483.7.e".
  • 10ulRet - Concentration of all-trans-retinal used
  • red - LED color used to stimulate. Always red for this manuscript.

Regenerating figures

Once post processing has been run, figures from the mansucript can then be generated using scripts in

Please refer to instructions_to_generate_figures.csv for instructions on which Matlab script to run to generate each specific figure.


The University of Turin (UniTO) released the open-access dataset Stoke collected for the homonymous Use Case 3 in the DeepHealth project ( UniToBrain is a dataset of Computed Tomography (CT) perfusion images (CTP).


Visit to have a full companion code where a U-Net model is trained over the dataset.


Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising approximately 566 gene expression arrays, 834 copy number arrays, and 13,472 clinical phenotype data points. These data are currently housed in Georgetown University's G-DOC System and are described in a related manuscript .