Fast traslate Icon translate Fast traslate Icon translate Because this instructional material will be used for teaching in some way, Youtube will be used because it offers authentic and can allow learners to review and rebuild concepts for national sustainable education with a humanistic national standard. Students will be able to understand further educational programs, essential thinking systems, and standards aspects by watching the video transcriptions in the table below.
This paper describes the creation of the Massive Arabic Speech Corpus (MASC). MASC is a dataset that contains 1,000 hours of speech sampled at 16 kHz and crawled from over 700 YouTube channels. The dataset is multi-regional, multi-genre, and multi-dialect intended to advance the research and development of Arabic speech technology with a special emphasis on Arabic speech recognition. In addition to MASC, a pre-trained 3-gram language model and a pre-trained automatic speech recognition model are also developed and made available to interested researchers.