This project investigates bias in automatic facial recognition (FR). Specifically, subjects are grouped into predefined subgroups based on gender, ethnicity, and age. We propose a novel image collection called Balanced Faces in the Wild (BFW), which is balanced across eight subgroups (i.e., 800 face images of 100 subjects, each with 25 face samples).


The Open Big Healthy Brains (OpenBHB) dataset is a large (N>5000) multi-site 3D brain MRI dataset gathering 10 public datasets (IXI, ABIDE 1, ABIDE 2, CoRR, GSP, Localizer, MPI-Leipzig, NAR, NPC, RBP) of T1 images acquired across 93 different centers, spread worldwide (North America, Europe and China). Only healthy controls have been included in OpenBHB with age ranging from 6 to 88 years old, balanced between males and females.


Please read carrefuly the following sections.

Dataset organization

This dataset comprises 3985 images for training and 666 images for test (kept hidden for the challenge), both dedicated to the OpenBHB challenge. Additionally, 628 images are available with missing label information (age, sex, or scanner details) and they are excluded for the current challenge. The exact content of this dataset is described in our paper.

The dataset is organized as follows:

  • All meta-data information (age, sex, site, acquisition setting, magnetic field strengh, etc.) can be found in participants.tsv.
  • Corresponding T1 images pre-processed with CAT12 (VBM), FSL (SBM) and Quasi-Raw can be found in training_data.
  • The pairs (site, acquisition setting) discretized used for the OpenBHB Challenge are in official_site_class_labels.tsv.
  • Additional T1 images with missing label information are in missing_label_data.
  • The metrics used for Quality Check (e.g Euler number for FreeSurfer) can be found in qc.tsv.


  • the templates used during the VBM analysis can be found in cat12vbm_space-MNI152_desc-gm_TPM.nii.gz.
  • the templates used during the Quasi-Raw analysis can be found in quasiraw_space-MNI152_desc-brain_T1w.nii.gz.
  • the Region-Of-Interest (ROI) names corresponding to the default CAT12 atlas (Neuromorphometrics) and FSL Desikan and Destrieux atlases can be found in cat12vbm_labels.txt, freesurfer_atlas-desikan_labels.txt and freesurfer_atlas-destrieux_labels.txt respectively.
  • the surface-based feature names derived by FreeSurfer on both Desikan and Destrieux atlases are available in freesurfer_channels.txt.


If you use this dataset for your work, please use the following citation:


      title={{OpenBHB: a Large-Scale Multi-Site Brain MRI Data-set for Age Prediction and Debiasing}},

      author={Dufumier, Benoit and Grigis, Antoine and Victor, Julie and Ambroise, Corentin and Frouin, Vincent and Duchesnay, Edouard},

      journal={Under review.},



Licence and Data Usage Agreement

This dataset is under Licence CC BY-NC-SA 3.0. By downloading this dataset, you also agree to the most restrictive Data Usage Agreement (DUA) of all cohorts (see the Data Usage Agreement terms included in this dataset):

  • ABIDE 1 [1]. Licence term CC BY-NC-SA 3.0 (ShareAlike), DUA
  • ABIDE 2 [2]. Licence term CC BY-NC-SA 3.0, DUA
  • IXI [3]. Licence term CC0, DUA
  • CoRR [4] Licence term CC0, DUA
  • GSP [5]  Licence term  DUA
  • NAR [6] Licence term CC0
  • MPI-Leipzig [7] Licence term CC0
  • NPC [8] Licence term CC0
  • RBP [9,10] Licence term CC0
  • Localizer [11] Licence term CC BY 3.0


  1. [1]
  2. [2]
  3. [3]
  4. [4] Zuo, X.N., et al., An Open Science Resource for Establishing Reliability and Reproducibility in Functional Connectomics, (In Press)
  5. [5] Buckner, Randy L.; Roffman, Joshua L.; Smoller, Jordan W., 2014, "Brain Genomics Superstruct Project (GSP)",, Harvard Dataverse, V10
  6. [6] Nastase, S. A., et al., Narratives: fMRI data for evaluating models of naturalistic language comprehension.
  7. [7] Babayan, A., Erbey, M., Kumral, D. et al. A mind-brain-body dataset of MRI, EEG, cognition, emotion, and peripheral physiology in young and old adults. Sci Data 6, 180308 (2019).
  8. [8] Sunavsky, A. and Poppenk, J. (2020). Neuroimaging predictors of creativity in healthy adults. OpenNeuro. doi: 10.18112/openneuro.ds002330.v1.1.0
  9. [9] Li, P., & Clariana, R. (2019) Reading comprehension in L1 and L2: An integrative appraoch. Journal of Neurolinguistics, 50, 94-105.(
  10. [10] Follmer, J., Fang, S., Clariana, R., Meyer, B., & Li, P (2018). What predicts adult readers' understanding of STEM texts? Reading and Writing, 31, 185-214.(
  11. [11] Orfanos, D. P., Michel, V., Schwartz, Y., Pinel, P., Moreno, A., Le Bihan, D., & Frouin, V. (2017). The brainomics/localizer database. NeuroImage, 144, 309-314.