Datasets
Standard Dataset
U.S. and China Flight Delay Datasets(including delay, weather categories, airport features, and aviation network crowdedness matrices)
- Citation Author(s):
- Submitted by:
- ZeYu Zhang
- Last updated:
- Wed, 01/08/2025 - 05:07
- DOI:
- 10.21227/f33y-yy32
- License:
- Categories:
- Keywords:
Abstract
The U.S. delay dataset is collected from Kaggle(https://www.kaggle.com/datasets/robikscube/flight-delay-dataset-20182022), covering three years of flight data from January 1, 2017, to December 31, 2019. The dataset originally collected includes data from 360 airports. We remove airports with fewer annual flight numbers and select data from 75 medium and large airports for our experiments. The weather dataset summarizes 12 weather categories, including normal weather, light rain, moderate rain, heavy rain, light snow, heavy snow, moderate fog, severe fog, precipitation, storm, hail, and severe cold. The China delay dataset was collected from Xiecheng(https://pan.baidu.com/s/1dEPyMGh\#list/path=\%2F), covering two years of flight data from May 1, 2015, to June 1, 2017. We select airports with a total number of flights exceeding 10000. From related special event data, 10 weather categories are obtained, including normal weather, thunderstorms, cloud, thunder, fog, strong winds, storms, snow, and severe convective weather. During the experiment, we only consider flight records between 6:00 AM and 11:59 PM, as very few flights were observed outside this time frame.
The udata folder includes all data from USA. File us_delay_17-19.npy and us_weather17-19.npy are the delay and weather data of three years, respectively. The aviation network crowdedness matrices are in the crowdedness folder, named us_od_a17-19.npy and us_od_s17-19.npy. The airports_feature.npy and airports_position.npy represent the features and positions of USA airports, respectively. The same applies to the China dataset in the cdata folder.