Arabic speech recognition

The dataset collected for the whole Quran; 114 sura (6236 ayah) recited by 35 Reciters (approximately 218000 audio files), downloaded from this website https://www.a-quran.com/showthread.php?t=11017, the audio files downloaded in mp3 format, all the downloaded files based on the Hafs from A’asim narration, the dataset figure shows reciters names who participate in this dataset.

 

Categories:
438 Views

This paper describes the creation of the Massive Arabic Speech Corpus (MASC). MASC is a dataset that contains 1,000 hours of speech sampled at 16 kHz and crawled from over 700 YouTube channels. The dataset is multi-regional, multi-genre, and multi-dialect intended to advance the research and development of Arabic speech technology with a special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and made available to interested researchers.

Categories:
9400 Views