Datasets
Standard Dataset
Individualized Deepfake Detection Dataset
- Citation Author(s):
- Submitted by:
- Mushfiqur Rahman
- Last updated:
- Sat, 01/20/2024 - 18:51
- DOI:
- 10.21227/w7ma-fp34
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
The Deepfake face detection task involves a facial image of unknown authenticity for testing. While most deepfake detection methods take only the image as input, our literature demonstrates that conditioning the deepfake detector on identity—i.e., knowing whose deepfake face the picture might be—can enhance detection performance. Existing deepfake detection datasets, such as FaceForensics++ and DFDC, do not include identity information for authentic and deepfake faces. This dataset contains facial images of 45 specific individuals, divided into train and test sets, including a total of 23k authentic and 22k deepfake images. Having a specific individual's images in both the train and test sets allows us to assess detection performance for that individual. The dataset is curated so that the train and test sets are from two independent sources. The train images are curated from the CelebDFv2 dataset, and the test images are curated from the CACD dataset. Deepfake faces are generated using FaceswapGAN, utilizing a portion of the train images to train the reconstruction model. The test deepfake images are faceswapped with another identity not included in our celebrity list. On the other hand, the train deepfake images are reconstructed images of that person. The deepfake detection method proposed in our paper requires reconstructing both the train and test images. The reconstructed test images and reconstructed train images are also available in this dataset. It is worth mentioning that reconstructing the training deepfake images produces doubly reconstructed images.
Main dataset
Training:
Authentic images: celebdf.tar.gz
Deepfake images: celebdf_recons.tar.gz
Testing:
Authentic images: cacd.tar.gz
Deepfake images: cacd_deepfake.tar.gz
Individual list
celebrity_list.txt
Double neural network:
Reconstructed images required for our proposed double neural network method:
Training:
Reconstructed authentic images: celebdf_recons.tar.gz (same as deepfake training images)
Reconstructed deepfake images: celebdf_double_recons.tar.gz
Testing:
Reconstructed authentic images: cacd_recons.tar.gz
Reconstructed deepfake images: cacd_deepfake_recons.tar.gz
Dataset Files
- celebrity_list.txt (1.06 kB)
- feature-xception.tar.gz (363.17 MB)
- required-final.tar.gz (848.20 MB)
- feature-efficientnet.tar.gz (392.35 MB)
- celebdf.tar.gz (40.44 MB)
- celebdf_recons.tar.gz (38.70 MB)
- celebdf_double_recons.tar.gz (38.19 MB)
- cacd_auth.tar.gz (12.46 MB)
- cacd_auth_recons.tar.gz (11.64 MB)
- cacd_deepfake.tar.gz (8.39 MB)
- cacd_deepfake_recons.tar.gz (8.09 MB)
- compact-train-swapped.tar.gz (2.12 GB)
- finetuned-on-compact-train-swapped.zip (96.61 MB)
- finetuned-backbone.tar.gz (96.55 MB)
Comments
Give access to download the dataset