Datasets
Standard Dataset
MTC-VC: A Multi-Task Contrastive Learning Method for Efficient and Controllable Voice Cloning

- Citation Author(s):
- Submitted by:
- Rui Zhou
- Last updated:
- Mon, 04/07/2025 - 07:29
- DOI:
- 10.21227/wpxz-3c67
- License:
- Categories:
- Keywords:
Abstract
The LibriSpeech corpus, a publicly available English speech dataset derived from audiobook recordings. The corpus contains approximately 1,000 hours of 16 kHz read speech from over 2,400 speakers, encompassing diverse speaking styles, rates, and regional accents. For the purpose of contrastive learning, a subset of 100 speakers was sampled, with 20 utterances per speaker ranging from 3 to 10 seconds. The dataset provides clean, labeled speech suitable for tasks involving speaker representation, acoustic modeling, and multi-style synthesis.
The LibriSpeech corpus, a publicly available English speech dataset derived from audiobook recordings. The corpus contains approximately 1,000 hours of 16 kHz read speech from over 2,400 speakers, encompassing diverse speaking styles, rates, and regional accents. For the purpose of contrastive learning, a subset of 100 speakers was sampled, with 20 utterances per speaker ranging from 3 to 10 seconds. The dataset provides clean, labeled speech suitable for tasks involving speaker representation, acoustic modeling, and multi-style synthesis.