ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems

Citation Author(s):: Yuan Gong (University of Notre Dame)

Jian Yang (University of Notre Dame)

Jacob Huber (University of Notre Dame)

Mitchell MacKnight (University of Notre Dame)

Christian Poellabauer (University of Notre Dame)
Submitted by:: Yuan Gong
Last updated:: Wed, 06/24/2020 - 03:25
DOI:: 10.21227/1mhq-c052
Data Format:: *.zip
Links:: paper corresponding to the dataset

1967 views

Categories:

Keywords:

replay attack

spoofing attack

voice-controlled system

microphone array

voice corpus

CITE

Abstract

We introduce a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs). In contrast to prior efforts, the proposed database contains both genuine voice commands and replayed recordings of such commands, collected in realistic VCSs usage scenarios and using modern voice assistant development kits. Specifically, the database contains recordings from four systems (each with a different microphone array) in a variety of environmental conditions with different forms of background noise and relative positions between speaker and device. To the best of our knowledge, this is the first publicly available database1 that has been specifically designed for the protection of state-of-the-art voice-controlled systems against various replay attacks in various conditions and environments.

Instructions:

The corpus consists of three sets: the core, evaluation, and complete set. The complete set contains all the data (i.e., complete set = core set + evaluation set) and allows the user to freely split the training/test set. Core/evaluation sets suggest a default training/test split. For each set, all *.wav files are in the /data directory and the meta information is in meta.csv file. The protocol is described in the readme.txt. A PyTorch data loader script is provided as an example of how to use the data. A python resample script is provided for resampling the dataset into the desired sample rate.

Thanks for your great work, I'd like to work on this research topic further, which may depend on your dataset!

Xinfeng Li Sun, 04/03/2022 - 04:43 Permalink

Thanks for your great work, I'd like to work on this research topic further, which may depend on your dataset!

Yijie Lou Thu, 07/07/2022 - 07:44 Permalink