Vocal92: Multimodal Audio Dataset with a Cappella Solo Singing and Speech

Citation Author(s):
zhuo
deng
ruohua
zhou
Submitted by:
zhuo deng
Last updated:
Mon, 07/08/2024 - 15:59
DOI:
10.21227/7t1f-a022
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

We present Vocal92, a multivariate Cappella solo singing and speech audio dataset spanning around 146.73 hours sourced from volunteers. To the best of our knowledge, this is the first dataset of its kind that specifically focuses on a cappella solo singing and speech. Furthermore, we use two current state-of-the-art models to construct the singer recognition baseline system.

 

The dataset has a wide range of applications, including music information retrieval, singer recognition, and multimodal speaker recognition. We believe that the release of Vocal92 will be of significant interest to researchers working in these fields, as well as to the broader community of researchers working on multimodal audio processing.

Instructions: 

In this article, we present the first multimodal audio dataset specifically focusing on a cappella solo singing and speech. Our dataset, Vocal92, consists of both singing and speech by 92 singers and represents a significant advancement in the field, filling a gap in the availability of multimodal audio datasets for this specific application.

Comments

The complete dataset is available at https://pan.baidu.com/s/1Pn62DHfal2OOZ_5JqgGBdQ with jnz5 as the validation code.

Submitted by zhuo deng on Wed, 03/01/2023 - 01:39