*.wav; *.csv; *.vtt; *.txt
![](https://ieee-dataport.org/sites/default/files/styles/3x2/public/tags/images/artificial-intelligence-2167835_1920.jpg?itok=wAd0kf8k)
This paper describes the creation of the Massive Arabic Speech Corpus (MASC). MASC is a dataset that contains 1,000 hours of speech sampled at 16 kHz and crawled from over 700 YouTube channels. The dataset is multi-regional, multi-genre, and multi-dialect intended to advance the research and development of Arabic speech technology with a special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and made available to interested researchers.
- Categories: