Datasets
Standard Dataset
Children Arabic Utterances for Mispronunciation Detection
- Citation Author(s):
- Submitted by:
- Sherin Moussa
- Last updated:
- Mon, 06/26/2023 - 07:38
- DOI:
- 10.21227/p5k8-6m10
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Children Arabic Utterances for Mispronunciation Detection Dataset
Audio samples were recorded from 27 Egyptian children (14 boys and 13 girls aged between 7 and 12 years old), where they pronounce 16 words. The files are organized into folders and subfolders that contain the following: the dataset is managed and separated into 2 folders (Correct / Wrong) pronunciations. The dataset is collected and annotated on segmental pronunciation errors by Arabic linguistics experts from NahdetMisr Publishing House (https://nahdetmisr.com/).
We would like to acknowledge NahdetMisr Publishing House for their generous support and collaboration to provide the required resources and expertise, which greatly contributed to the success of this research project.
For more details, please contact:
Mona A. Sadik and Sherin M. Moussa
Faculty of Computer and Information Sciences,
Ain Shams University
mona.sadik@cis.asu.edu.eg, sherinmoussa@cis.asu.edu.eg
Children Arabic Utterances for Mispronunciation Detection Dataset
Audio samples were recorded from 27 Egyptian children (14 boys and 13 girls aged between 7 and 12 years old), where they pronounce 16 words. The files are organized into folders and subfolders that contain the following: the dataset is managed and separated into 2 folders (Correct / Wrong) pronunciations. Each folder is further split for each 27 speakers; each contains .wav files of all the pronounced words. The collected pronounciations were processed through the software of Audacity to obtain the audio .wav files with mono channel and a sampling rate of 16kHz. The dataset is collected and annotated on segmental pronunciation errors by Arabic linguistics experts from NahdetMisr Publishing House (https://nahdetmisr.com/).
indexWordindexWord
1عين27بكى
9شرب29رسم
10خرج30كتب
11دخل31فتح
14عائلة32غسل
21مسجد33قرأ
23درج36دب
26ضحك40حصان
We would like to acknowledge NahdetMisr Publishing House for their generous support and collaboration to provide the required resources and expertise, which greatly contributed to the success of this research project.
For more details, please contact:
Mona A. Sadik and Sherin M. Moussa
Faculty of Computer and Information Sciences,
Ain Shams University
mona.sadik@cis.asu.edu.eg, sherinmoussa@cis.asu.edu.eg
Dataset Files
- Correct-speakers01to04.zip (4.59 MB)
- Correct-speakers05to08.zip (6.43 MB)
- Correct-speakers09to12.zip (6.92 MB)
- Correct-speakers13to16.zip (7.75 MB)
- Correct-speakers17to20.zip (7.09 MB)
- Correct-speakers21to24.zip (4.31 MB)
- Correct-speakers25to27.zip (3.11 MB)
- Wrong-speakers01to04.zip (7.36 MB)
- Wrong-speakers05to08.zip (6.14 MB)
- Wrong-speakers09to12.zip (6.78 MB)
- Wrong-speakers13to16.zip (7.07 MB)
- Wrong-speakers17to20.zip (6.61 MB)
- Wrong-speakers20to24.zip (2.71 MB)
- Wrong-speakers25to27.zip (3.36 MB)
Comments
Correct and Wrong samples are included.