Air Traffic Controller Radiotelephony Fatigue Corpus
his corpus was approved by Air Traffic Management Bureau, Civil Aviation Administration of China (CAAC). All speech had been recorded by real radiotelephony between air traffic controllers (ATCs) and pilots from December 15, 2020 to January 14, 2021. The raw data comprised around 700,000 segments of ATCs speech involving all work stations over 3 daily periods (0200–0700, 1000–1200 and 1330–1530 hours). Finally, seven controllers with different genders, age groups, controller levels and control positions were identified, namely ATCs_1 to ATCs_7. Their working speech records were selected from the raw data and pre-processed. Additionally, based on the standard principle of radiotelephony, one specific speech utterance was cut from the entire speech record that contained speech from both pilots and ATCs with the help of software that analyzes semantic context (GoldWave). Moreover, signal preprocessing techniques such as pre-emphasis, framing and windowing were utilized to improve the quality of the speech samples. Furthermore, a training data set was also recorded with fatigue and non-fatigue speech from different ATCs.
The corpus was divided into 7 air traffic controllers sub dataset namely ATCs_1 to ATCs_7. Each sub dataset consists fatigue original speech and non-fatigue original speech. Then the preproceeding features for fatigue speech and non-fatigue were stored two individual folds. All of the processed data were categorized into fatigue and non-fatigue samples by experts of civil aviation.
- ATCs_2_speech.zip (24.90 MB)
- ATCs_4_speech.zip (16.94 MB)
- ATCs_1_speech.zip (33.68 MB)
- ATCs_3_speech.zip (24.74 MB)
- ATCs_5_speech.zip (34.94 MB)
- ATCs_6_speech.zip (45.97 MB)
- ATCs_7_speech.zip (22.34 MB)
- training_data_set.zip (41.01 MB)