The file comprises of the sentences used as input for the given problem statement


The Open Big Healthy Brains (OpenBHB) dataset is a large (N>5000) multi-site 3D brain MRI dataset gathering 10 public datasets (IXI, ABIDE 1, ABIDE 2, CoRR, GSP, Localizer, MPI-Leipzig, NAR, NPC, RBP) of T1 images acquired across 93 different centers, spread worldwide (North America, Europe and China). Only healthy controls have been included in OpenBHB with age ranging from 6 to 88 years old, balanced between males and females.


Please read carrefuly the following sections.

Dataset organization

This dataset comprises 3985 images for training and 666 images for test (kept hidden for the challenge), both dedicated to the OpenBHB challenge. Additionally, 628 images are available with missing label information (age, sex, or scanner details) and they are excluded for the current challenge. The exact content of this dataset is described in our paper.

The dataset is organized as follows:

  • All meta-data information (age, sex, site, acquisition setting, magnetic field strengh, etc.) can be found in participants.tsv.
  • Corresponding T1 images pre-processed with CAT12 (VBM), FSL (SBM) and Quasi-Raw can be found in training_data.
  • The pairs (site, acquisition setting) discretized used for the OpenBHB Challenge are in official_site_class_labels.tsv.
  • Additional T1 images with missing label information are in missing_label_data.
  • The metrics used for Quality Check (e.g Euler number for FreeSurfer) can be found in qc.tsv.


  • the templates used during the VBM analysis can be found in cat12vbm_space-MNI152_desc-gm_TPM.nii.gz.
  • the templates used during the Quasi-Raw analysis can be found in quasiraw_space-MNI152_desc-brain_T1w.nii.gz.
  • the Region-Of-Interest (ROI) names corresponding to the default CAT12 atlas (Neuromorphometrics) and FSL Desikan and Destrieux atlases can be found in cat12vbm_labels.txt, freesurfer_atlas-desikan_labels.txt and freesurfer_atlas-destrieux_labels.txt respectively.
  • the surface-based feature names derived by FreeSurfer on both Desikan and Destrieux atlases are available in freesurfer_channels.txt.


If you use this dataset for your work, please use the following citation:


      title={{OpenBHB: a Large-Scale Multi-Site Brain MRI Data-set for Age Prediction and Debiasing}},

      author={Dufumier, Benoit and Grigis, Antoine and Victor, Julie and Ambroise, Corentin and Frouin, Vincent and Duchesnay, Edouard},

      journal={Under review.},



Licence and Data Usage Agreement

This dataset is under Licence CC BY-NC-SA 3.0. By downloading this dataset, you also agree to the most restrictive Data Usage Agreement (DUA) of all cohorts (see the Data Usage Agreement terms included in this dataset):

  • ABIDE 1 [1]. Licence term CC BY-NC-SA 3.0 (ShareAlike), DUA
  • ABIDE 2 [2]. Licence term CC BY-NC-SA 3.0, DUA
  • IXI [3]. Licence term CC0, DUA
  • CoRR [4] Licence term CC0, DUA
  • GSP [5]  Licence term  DUA
  • NAR [6] Licence term CC0
  • MPI-Leipzig [7] Licence term CC0
  • NPC [8] Licence term CC0
  • RBP [9,10] Licence term CC0
  • Localizer [11] Licence term CC BY 3.0


  1. [1]
  2. [2]
  3. [3]
  4. [4] Zuo, X.N., et al., An Open Science Resource for Establishing Reliability and Reproducibility in Functional Connectomics, (In Press)
  5. [5] Buckner, Randy L.; Roffman, Joshua L.; Smoller, Jordan W., 2014, "Brain Genomics Superstruct Project (GSP)",, Harvard Dataverse, V10
  6. [6] Nastase, S. A., et al., Narratives: fMRI data for evaluating models of naturalistic language comprehension.
  7. [7] Babayan, A., Erbey, M., Kumral, D. et al. A mind-brain-body dataset of MRI, EEG, cognition, emotion, and peripheral physiology in young and old adults. Sci Data 6, 180308 (2019).
  8. [8] Sunavsky, A. and Poppenk, J. (2020). Neuroimaging predictors of creativity in healthy adults. OpenNeuro. doi: 10.18112/openneuro.ds002330.v1.1.0
  9. [9] Li, P., & Clariana, R. (2019) Reading comprehension in L1 and L2: An integrative appraoch. Journal of Neurolinguistics, 50, 94-105.(
  10. [10] Follmer, J., Fang, S., Clariana, R., Meyer, B., & Li, P (2018). What predicts adult readers' understanding of STEM texts? Reading and Writing, 31, 185-214.(
  11. [11] Orfanos, D. P., Michel, V., Schwartz, Y., Pinel, P., Moreno, A., Le Bihan, D., & Frouin, V. (2017). The brainomics/localizer database. NeuroImage, 144, 309-314.

Dataset asscociated with a paper in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems

"Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments"

If you use this code or data, please cite the above paper.



See the docs directory.


This is the original data and the fitted data from the paper Scalability in Computing and Robotics by Heiko Hamann and Andreagiovanni Reina. For more information please refer to the paper.



This dataset contains video sequences and stereo reconstruction results supporting the IEEE Access contribution "Stereo laryngoscopic impact site prediction for droplet-based stimulation of the laryngeal adductor reflex" (J. F. Fast et al.).

See readme file for further information.


See provided readme file for instructions.


We focus on subjective and objective Point Cloud Quality Assessment (PCQA) in an immersive environment and study the effect of geometry and texture attributes in compression distortion. Using a Head-Mounted Display (HMD) with six degrees of freedom, we establish a subjective PCQA database named SIAT Point Cloud Quality Database (SIAT-PCQD). Our database consists of 340 distorted point clouds compressed by the MPEG point cloud encoder with the combination of 20 sequences and 17 pairs of geometry and texture quantization parameters.


GesHome dataset consists of 18 hand gestures from 20 non-professional subjects with various ages and occupation. The participant performed 50 times for each gesture in 5 days. Thus, GesHome consists of 18000 gesture samples in total. Using embedded accelerometer and gyroscope, we take 3-axial linear acceleration and 3-axial angular velocity with frequency equals to 25Hz. The experiments have been video-recorded to label the data manually using ELan tool.


We crawled large amounts of biomedical articles from PubMed for the keyphrase extraction system evaluation.

The articles, that consist of title, abstract and keyphrases provided by the authors, are used for the experiments.

In our paper, cancer-related biomedical articles are selected.


Each document in the dataset consists of title, abstract and keyphrases provided by the authors.


In  order  to  explore  the  possible  drawbacks  of  a blockchain-based  system  for  data  management for nuclear Power Plant inspections,  a  simulator  has been implemented to recreate the scenario of an inspection. The results contain execution times, source code and instructions to deploy de simulator 


This Dataset consist of 3Dmodels in Spherical harmonic coefficients andcorresponding shortcut of front view, the SH degree is 80.