OpenBHB: a Multi-Site Brain MRI Dataset for Age Prediction and Debiasing

Citation Author(s):
Benoit
Dufumier
CEA Saclay
Antoine
Grigis
CEA Saclay
Julie
Victor
CEA Saclay
Corentin
Ambroise
CEA Saclay
Vincent
Frouin
CEA Saclay
Edouard
Duchesnay
CEA Saclay
Submitted by:
Benoit Dufumier
Last updated:
Tue, 09/20/2022 - 11:37
DOI:
10.21227/7jsg-jx57
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The Open Big Healthy Brains (OpenBHB) dataset is a large (N>5000) multi-site 3D brain MRI dataset gathering 10 public datasets (IXI, ABIDE 1, ABIDE 2, CoRR, GSP, Localizer, MPI-Leipzig, NAR, NPC, RBP) of T1 images acquired across 93 different centers, spread worldwide (North America, Europe and China). Only healthy controls have been included in OpenBHB with age ranging from 6 to 88 years old, balanced between males and females. All T1 images have been uniformly pre-processed with CAT12 (SPM), FreeSurfer (FSL) and Quasi-Raw (in-house minimal pre-processing) and they all passed a visual quality check. Both Voxel-Based Morphometry and Surface-Based Morphometry measures are available for each T1 MRI. Participant's age and sex are provided as well as the acquisition site, MRI magnetic field and MRI scanner settings used for each image acquisition. 

Note: OpenBHB has been divided into an official train, validation and test split for the open challenge currently deployed on brain age prediction and site-effect removal (see below). To avoid any data leakage during this challenge, data in test are kept private on the submission servers to compute the challenge metrics. Only training and validation data are openly available for now.

The OpenBHB Challenges

  1. Brain age prediction and debiasing with site-effect removal

OpenBHB has been designed for brain age prediction and debiasing with site-effect removal in current brain MRI datasets through representation learning. The challenge consists in developing new algorithms taking as input T1 MRI images available in OpenBHB and outputting representation vectors preserving the biological variability (age) and removingundesirable non-biological confounding variables (acquisition site/settings). The representation quality is evaluated through linear probing on brain age prediction and site debiasing with various metrics (e.g Mean Absolute Error). All algorithms can be submitted on RAMP (check out our webpage for more details) with a public recording of their performance and an official leaderboard. This challenge should promote reproducible research in neuroimaging and it tackles 2 hot topics in both computer vision and neuroimaging, namely representation learning and debiasing.

Comments

Good Morning sir I am not able to access the datasat to analyse. I want the link of the dataset can you please provide it. I will be thankful if you provide me the dataset.

Submitted by Sumiran Singh on Tue, 10/17/2023 - 01:39