ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems
We introduce a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs). In contrast to prior efforts, the proposed database contains both genuine voice commands and replayed recordings of such commands, collected in realistic VCSs usage scenarios and using modern voice assistant development kits. Specifically, the database contains recordings from four systems (each with a different microphone array) in a variety of environmental conditions with different forms of background noise and relative positions between speaker and device. To the best of our knowledge, this is the first publicly available database1 that has been specifically designed for the protection of state-of-the-art voice-controlled systems against various replay attacks in various conditions and environments.
The corpus consists of three sets: the core, evaluation, and complete set. The complete set contains all the data (i.e., complete set = core set + evaluation set) and allows the user to freely split the training/test set. Core/evaluation sets suggest a default training/test split. For each set, all *.wav files are in the /data directory and the meta information is in meta.csv file. The protocol is described in the readme.txt. A PyTorch data loader script is provided as an example of how to use the data. A python resample script is provided for resampling the dataset into the desired sample rate.