Datasets
Standard Dataset
Vocal92: Multimodal Audio Dataset with a Cappella Solo Singing and Speech
- Citation Author(s):
- Submitted by:
- zhuo deng
- Last updated:
- Mon, 07/08/2024 - 15:59
- DOI:
- 10.21227/7t1f-a022
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
We present Vocal92, a multivariate Cappella solo singing and speech audio dataset spanning around 146.73 hours sourced from volunteers. To the best of our knowledge, this is the first dataset of its kind that specifically focuses on a cappella solo singing and speech. Furthermore, we use two current state-of-the-art models to construct the singer recognition baseline system.
The dataset has a wide range of applications, including music information retrieval, singer recognition, and multimodal speaker recognition. We believe that the release of Vocal92 will be of significant interest to researchers working in these fields, as well as to the broader community of researchers working on multimodal audio processing.
In this article, we present the first multimodal audio dataset specifically focusing on a cappella solo singing and speech. Our dataset, Vocal92, consists of both singing and speech by 92 singers and represents a significant advancement in the field, filling a gap in the availability of multimodal audio datasets for this specific application.
Comments
The complete dataset is available at https://pan.baidu.com/s/1Pn62DHfal2OOZ_5JqgGBdQ with jnz5 as the validation code.