SynSpeechDDB: a new synthetic speech detection database

Citation Author(s):: Zhenyu Zhang

Yewei Gu

Xiaowei Yi

Xianfeng Zhao
Submitted by:: Zhenyu Zhang
Last updated:: Thu, 10/22/2020 - 10:04
DOI:: 10.21227/ta8z-mx73

941 views

Categories:

Artificial Intelligence

ACCESS DATASET CITE

Abstract

A speech dataset used for fake speech detection. The fake speech are generated by 8 well-known latest deep learning based open-sourced tools and 8 commercial speech synthesis products. All speech are in Chinese or English. It contains more than 127,890 synthetic speech and 14,400 natural speech in English and mandarin Chinese languages.

Instructions:

To create this dataset, we collected real speech utterances from the VCTK base corpus and Aishell-1 database , and used a special set of phrases to generate utterances from each TTS or VC system. For each utterance in IIEAFC, the duration is randomly set in the range between 2s and 10s, sampling rate of 44.1 kHz, 16-bit quantization and is stored in WAV format.

The IIEAFC dataset is partitioned into three disjoint datasets, namely training, development and evaluation which comprise 50,000 utterances, 50,000 utterances and 42,290 utterances respectively. While the training and development sets contain fake speech generated with the same algorithms (designated as known attacks), the evaluation set also contains attacks generated with different algorithms (designated as unknown attacks).

Great work!

Ricardo Perez Fri, 12/11/2020 - 00:47 Permalink

Thanks

xiang xia Wed, 12/16/2020 - 02:12 Permalink

thanks

Jiangyan Yi Thu, 03/11/2021 - 08:34 Permalink

good

kaustav mukherjee Thu, 06/17/2021 - 17:23 Permalink

cannot download

kaustav mukherjee Thu, 06/17/2021 - 17:24 Permalink

Lori Buyanovsky Sun, 08/14/2022 - 14:54 Permalink

cannot download

Lori Buyanovsky Sun, 08/14/2022 - 14:55 Permalink

Thanks for sharing! How can I download this dataset?

Weichen Lian Thu, 08/03/2023 - 02:58 Permalink

How can I download the dataset? I am unable to download it

Virinchi Sai A… Mon, 11/06/2023 - 17:20 Permalink

Dataset Files

Files have not been uploaded for this dataset

Datasets

Standard Dataset

SynSpeechDDB: a new synthetic speech detection database

Abstract

Instructions:

Dataset Files

QUESTIONS?

More from this Author

FMFCC-A DATASET

GAN based synthesized audio dataset

More like this Dataset

Weather Monitoring Station For Farms And Agriculture

Trilateration based on RSSI values in transmitters and receivers

The FLAME dataset: Aerial Imagery Pile burn detection using drones (UAVs)

Retinal Fundus Multi-disease Image Dataset (RFMiD)

Experimental database for detecting and diagnosing rotor broken bar in a three-phase induction motor.

Dataset for classification of handwritten and printed text in a Doctor's prescription