Multi-Task Faces (MTF) dataset

Citation Author(s):
Universitat Rovira i Virgili, Department of Computer Engineering and Mathematics
Submitted by:
Rami Haffar
Last updated:
Wed, 05/22/2024 - 13:18
0 ratings - Please login to submit your rating.


Human facial data hold tremendous potential to address a variety of classification problems, including face recognition, age estimation, gender identification, emotion analysis, and race classification. However, recent privacy regulations, such as the EU General Data Protection Regulation, have restricted the ways in which human images may be collected and used for research. As a result, several previously published data sets containing human faces have been removed from the internet due to inadequate data collection methods that failed to meet privacy regulations. Data sets consisting of synthetic data have been proposed as an alternative, but they fall short of accurately representing the real data distribution. On the other hand, most available data sets are labeled for just a single task, which limits their applicability. To address these issues, we present a collection of face images designed for various classification tasks, including face recognition and classification by race, gender, and age, as well as aiding to train generative networks. We named this collection the Multi-Task Face (MTF) data, and it is provided in two flavors: a non-curated data set that includes 132,816 images of 640 individuals, and a manually curated version with 5,246 images of 240 individuals meticulously selected to maximize their classification quality. The MTF data sets have been ethically gathered by leveraging publicly available images of celebrities and strictly adhering to copyright regulations. In addition to presenting the data and providing detailed descriptions of the collection and processing procedures followed, we also evaluate the suitability of the data for training five deep learning (DL) models across the aforementioned classification tasks. 


The main folder of the curated data set comprises three subsets,`Train', `Val', and `Test' to be used, respectively, for AI model training, AI model validation and hyperparameter tuning, and AI model testing (evaluation of the trained model). Inside each subset, which is also the root folder of the non-curated data set, there are four folders corresponding to race classification denoted as `Asian\_chinese\_korean', `Asian\_indian', `Black', and `White'. Within each race folder, there are two additional folders named `Males' and `Females', representing the two labels used for gender classification. Inside each of the gender folders, there are two more folders labeled `Young' and `Old', which indicate the two age categories used for age classification. Within each `Age' folder, there are folders corresponding to the identities. With this structure, researchers have the flexibility to rearrange the data within the data sets according to their specific preferences and requirements, e.g., by defining multi-criteria classification tasks.