Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR)

Citation Author(s):
Rahul
Vijaykumar
Clarkson University
Ajan
Ahmed
Clarkson University
John
Parker
Clarkson University
Aidan
Collins
Clarkson University
Dinesh Kumar
Pendyala
Clarkson University
Masudul H.
Imtiaz
Clarkson University
Submitted by:
Ajan Ahmed
Last updated:
Tue, 01/28/2025 - 23:30
DOI:
10.21227/ab5w-0c23
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Introduced here is the Extended-Length Audio Dataset for Synthetic Voice Detection and Speaker Recognition (ELAD-SVDSR), a resource designed to advance research in synthetic voice (DeepFake) detection and automatic speaker recognition (ASR). It features around 45-minute audio recordings from 36 participants, each of whom read aloud different newspaper articles during controlled sessions, captured with five different high-quality microphones. Synthetic voices generated from 20 subjects of this dataset using open-source and commercial software are also included. Supporting text-dependent  analysis, the dataset may enable diverse ASR modeling. This extended-duration audio may allow for the detection of nuanced artifacts and the generation of higher-quality synthetic samples, including those like Tortoise TTS and ElevenLabs, which already excel in shorter segments. Comprehensive metadata on speaker demographics and recording conditions are expected to provide deeper insights into voice characteristics and model efficacy.  Publicly accessible, while all personal data has been anonymized to ensure privacy, ELAD-SVDSR is expected to drive significant advancements in biometric security, audio forensics, and voice authentication systems.

Instructions: 

Instructions are given in the readme.txt file.

Funding Agency: 
Center for Identification Technology Research and the National Science Foundation
Grant Number: 
1650503

Documentation

AttachmentSize
File EULA_ELAD_SVDSR.pdf28.46 KB
File readme_ELAD_SVDSR.txt706 bytes