.wav; .csv; .vtt; .txt

MASC: Massive Arabic Speech Corpus

This paper describes the creation of the Massive Arabic Speech Corpus (MASC). MASC is a dataset that contains 1,000 hours of speech sampled at 16 kHz and crawled from over 700 YouTube channels. The dataset is multi-regional, multi-genre, and multi-dialect intended to advance the research and development of Arabic speech technology with a special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and made available to interested researchers.

Categories:: Artificial Intelligence

10918 Views

*.wav; *.csv; *.vtt; *.txt

*.wav; *.csv; *.vtt; *.txt

MASC: Massive Arabic Speech Corpus

.wav; .csv; .vtt; .txt

.wav; .csv; .vtt; .txt