Datasets
Standard Dataset
Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR)
- Citation Author(s):
- Submitted by:
- Ajan Ahmed
- Last updated:
- Tue, 01/28/2025 - 23:30
- DOI:
- 10.21227/ab5w-0c23
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Introduced here is the Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR), a resource designed to advance research in synthetic voice (DeepFake) detection and automatic speaker recognition (ASR). It features around 45-minute audio recordings from 36 participants, each of whom read aloud different newspaper articles during controlled sessions, captured with five different high-quality microphones. Synthetic voices generated from 20 subjects of this dataset using open-source and commercial software are also included. Supporting text-dependent analysis, the dataset may enable diverse ASR modeling. This extended-duration audio may allow for the detection of nuanced artifacts and the generation of higher-quality synthetic samples, including those like Tortoise TTS and ElevenLabs, which already excel in shorter segments. Comprehensive metadata on speaker demographics and recording conditions are expected to provide deeper insights into voice characteristics and model efficacy. Publicly accessible, while all personal data has been anonymized to ensure privacy, ELAD-SVDSR is expected to drive significant advancements in biometric security, audio forensics, and voice authentication systems.
Instructions are given in the readme.txt file.
Documentation
Attachment | Size |
---|---|
EULA_ELAD_SVDSR.pdf | 28.46 KB |
readme_ELAD_SVDSR.txt | 706 bytes |