Dataset for article 'A Multi-tropical Cyclone Trajectory Prediction Method Based on a Density Map with Memory and Data Fusion'

Name: Dataset for article 'A Multi-tropical Cyclone Trajectory Prediction Method Based on a Density Map with Memory and Data Fusion'
Creator: Dongfang Ma
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence

Citation Author(s):: Dongfang Ma (Zhejiang University)

Zhaoyang Ma (Zhejiang University)

Jianmin Lin (Zhejiang University)
Submitted by:: Dongfang Ma
Last updated:: Tue, 02/11/2025 - 04:47
DOI:: 10.21227/bnhz-1v64
Data Format:: python numpy .npy file

97 views

Categories:

Artificial Intelligence

Keywords:

Tropical Cyclone

ACCESS DATASET CITE

Abstract

This dataset is the dataset used in article 'A Multi-tropical Cyclone Trajectory Prediction Method Based on a Density Map with Memory and Data Fusion' by Dongfang Ma, Zhaoyang Ma, Chengying Wu and Jianmin Lin. The authors are with the Institute of Marine Sensing and Networking, Ocean College, Zhejiang University, Zhoushan 316021, China. This dataset contains satellite images, density maps of TC locations and geopotential height maps. The density maps are generated from the TC best track dataset International Best Track Archive for Climate Stewardship (IBTrACS). Density map refers to an image indicating a sea area, and the value of each pixel on the density map represents the probability that this point is a TC center. The value of each pixel is between 0 and 1. The satellite images are generated from the Gridded Satellite (GridSat-B1) dataset, and the gph dataset is generated from the European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) dataset.

testcode.py is the testcode of this study, which outputs the TC trajectory prediction mean distance error of the proposed model.

Instructions:

den.npy is a numpy array with a shape of (7813,256,256), containing 7813 density maps with a size of 256*256. The density maps are generated from the TC best track dataset International Best Track Archive for Climate Stewardship (IBTrACS). IBTrACS collects observational data on TCs worldwide, including information on wind speeds, TC center locations, and intensities. Generation of the density map required three main steps. First, the best track data for a TC provided by IBTrACS were obtained, corresponding to the timestamp of the satellite image data. Second, a zero matrix with the same size as the TC image was created, and the relative position of the TC center obtained from the best track data in the image was set to one, which indicated that the probability of this point being the TC center was 100%. Finally, Gaussian filtering was performed on the images; the value of each pixel was obtained from the weighted average of its neighboring pixels, and a density map that could contain multiple TC centers was generated.

img.npy is a numpy with a shape of (7813,256,256), containing 7813 satellite images with a size of 256*256. Satellite images were obtained from the Gridded Satellite (GridSat-B1) dataset, which contains data from geostationary satellites with global coverage and an image size of 2,000 *5,143. The northwest Pacific region was selected for the analysis due to its high incidence of TCs. The images of the northwest Pacific region (0°-65°N, 100°E-180°E) were cropped out and scaled to 256*256 using bilinear interpolation. Typically, images consist of three channels: infrared (IR, 11 μm), water vapor (WV, 6.7 μm), and visible (VIS, 0.6 μm). The IR channel images were used in this study because they have the best quality and can reach the climate data record (CDR) standard.

gph.npy is a numpy array with a shape of (7813,6,256,256), containing 7813 geopotential height(GPH) maps with a size of 256*256*6. The GPH data were extracted from the European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) dataset. ERA5 provides high-resolution reanalyses of atmospheric, terrestrial and oceanic data such as temperature, wind speed, humidity, precipitation and geopotential height from 1979 to the present. In this study, GPH maps at the six pressure levels of 200, 300, 500, 700, 850, and 1,000 hPa were selected, representing the environmental data around the TC.

All three datasets correspond to each other in time, and the datasets are collected from 2018 to 2021 with a sampling interval of 6 h.

Based on the three datasets, we built the datsets for model training with time series. x_mTC.npy is a numpy array with a shape of (1786,8,7,256,256), representing the input training data for multiple TC scenarios. 8 represents the input time series length, 7 represents the channel numbers. Channel 0 contains density map data, channel 1 contains satellite image data, and channel 2-7 contains GPH data. y_mTC.npy is a numpy array with a shape of (1786,4,256,256), representing the output label for multiple TC scenarios. 4 represents the output time series length. x_sTC.npy is a numpy array with a shape of (4249,8,7,256,256), representing the input training data for single TC scenarios. 8 represents the input time series length, 7 represents the channel numbers. Channel 0 contains density map data, channel 1 contains satellite image data, and channel 2-7 contains GPH data. y_sTC.npy is a numpy array with a shape of (4249,4,256,256), representing the output label for single TC scenarios. 4 represents the output time series length.