Individualized Deepfake Detection Dataset

Citation Author(s):
Mushfiqur
Rahman
North Carolina State University
Submitted by:
Mushfiqur Rahman
Last updated:
Sat, 01/20/2024 - 18:51
DOI:
10.21227/w7ma-fp34
Data Format:
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The Deepfake face detection task involves a facial image of unknown authenticity for testing. While most deepfake detection methods take only the image as input, our literature demonstrates that conditioning the deepfake detector on identity—i.e., knowing whose deepfake face the picture might be—can enhance detection performance. Existing deepfake detection datasets, such as FaceForensics++ and DFDC, do not include identity information for authentic and deepfake faces. This dataset contains facial images of 45 specific individuals, divided into train and test sets, including a total of 23k authentic and 22k deepfake images. Having a specific individual's images in both the train and test sets allows us to assess detection performance for that individual. The dataset is curated so that the train and test sets are from two independent sources. The train images are curated from the CelebDFv2 dataset, and the test images are curated from the CACD dataset. Deepfake faces are generated using FaceswapGAN, utilizing a portion of the train images to train the reconstruction model. The test deepfake images are faceswapped with another identity not included in our celebrity list. On the other hand, the train deepfake images are reconstructed images of that person. The deepfake detection method proposed in our paper requires reconstructing both the train and test images. The reconstructed test images and reconstructed train images are also available in this dataset. It is worth mentioning that reconstructing the training deepfake images produces doubly reconstructed images.

Instructions: 

Main dataset

Training:

Authentic images: celebdf.tar.gz

Deepfake images: celebdf_recons.tar.gz

 

Testing:

Authentic images: cacd.tar.gz

Deepfake images: cacd_deepfake.tar.gz

 

Individual list

celebrity_list.txt

 

Double neural network:

Reconstructed images required for our proposed double neural network method:

Training:

Reconstructed authentic images: celebdf_recons.tar.gz (same as deepfake training images)

Reconstructed deepfake images: celebdf_double_recons.tar.gz

Testing:

Reconstructed authentic images: cacd_recons.tar.gz

Reconstructed deepfake images: cacd_deepfake_recons.tar.gz