Abstract

A speech dataset used for fake speech detection. The fake speech are generated by 8 well-known latest deep learning based open-sourced tools and 8 commercial speech synthesis products. All speech are in Chinese or English. It contains more than 127,890 synthetic speech and 14,400 natural speech in English and mandarin Chinese languages.

Instructions:

To create this dataset, we collected real speech utterances from the VCTK base corpus and Aishell-1 database , and used a special set of phrases to generate utterances from each TTS or VC system. For each utterance in IIEAFC, the duration is randomly set in the range between 2s and 10s, sampling rate of 44.1 kHz, 16-bit quantization and is stored in WAV format.

The IIEAFC dataset is partitioned into three disjoint datasets, namely training, development and evaluation which comprise 50,000 utterances, 50,000 utterances and 42,290 utterances respectively. While the training and development sets contain fake speech generated with the same algorithms (designated as known attacks), the evaluation set also contains attacks generated with different algorithms (designated as unknown attacks).

Comments

Great work!

Dataset Rating:

Submitted by Ricardo Perez on Thu, 12/10/2020 - 19:47

Thanks

Dataset Rating:

Submitted by xiang xia on Tue, 12/15/2020 - 21:12

thanks

Submitted by Jiangyan Yi on Thu, 03/11/2021 - 03:34

good

Submitted by kaustav mukherjee on Thu, 06/17/2021 - 13:23

cannot download

Submitted by kaustav mukherjee on Thu, 06/17/2021 - 13:24

Submitted by Lori Buyanovsky on Sun, 08/14/2022 - 10:54

cannot download

Submitted by Lori Buyanovsky on Sun, 08/14/2022 - 10:55

Thanks for sharing! How can I download this dataset?

Submitted by Weichen Lian on Wed, 08/02/2023 - 22:58

How can I download the dataset? I am unable to download it

Submitted by Virinchi Sai At... on Mon, 11/06/2023 - 12:20

Dataset Files

Files have not been uploaded for this dataset

Datasets

Standard Dataset

SynSpeechDDB: a new synthetic speech detection database

Abstract

Comments

More from this Author

GAN based synthesized audio dataset

Dataset Files

QUESTIONS?