Skip to main content

Datasets

Standard Dataset

SynSpeechDDB: a new synthetic speech detection database

Citation Author(s):
Zhenyu Zhang
Yewei Gu
Xiaowei Yi
Xianfeng Zhao
Submitted by:
Zhenyu Zhang
Last updated:
DOI:
10.21227/ta8z-mx73
941 views
Categories:
Average: 5 (1 vote)

Abstract

A speech dataset used for fake speech detection. The fake speech are generated by 8 well-known latest deep learning based open-sourced tools and 8 commercial speech synthesis products.  All speech are in Chinese or English.  It contains more than 127,890 synthetic speech  and 14,400 natural speech in English and mandarin Chinese languages. 

Instructions:

To create this dataset, we collected real speech utterances from the VCTK base corpus  and Aishell-1 database  , and  used a special set of phrases to generate utterances from each TTS or VC system.  For each utterance in IIEAFC, the duration is randomly set in the range between 2s and 10s, sampling rate of 44.1 kHz, 16-bit quantization and is stored in WAV format.

The IIEAFC dataset is partitioned into three disjoint datasets, namely training, development and evaluation which comprise 50,000 utterances,   50,000 utterances and 42,290 utterances respectively. While the training and development sets contain fake speech generated with the same algorithms  (designated as known attacks), the evaluation set also contains attacks generated with different algorithms (designated as unknown attacks).  

Dataset Files

Files have not been uploaded for this dataset