RSRPSet: the dataset of the 16th CPGMCM
All data used for model evaluation and training are from the 16th Chinese Post-Graduate Mathematical Contest in Modeling (CPGMCM). The dataset is measured by HUAWEI TECHNOLOGIES CO., LTD, which includes engineering parameter data, map data and RSRP label data of multiple communities. And the dataset is sponsored by the Ministry of Education's Degree and Graduate Education Development Center. The purpose of building the dataset is to find the mapping model between engineering parameters, geographical environment and RSRP, so as to quickly predict the RSRP value of a specific geographical location in the new environment.
ATTENTION: This is Not the orginal dataset of CPGMCM. The data have been preprocessed according to the requested privacy rules.
P.S.: Sorry for updating introduction and analysis of this dataset late, because I am bothered from the COVID-19 in our city.
This dataset has been applied in the paper "Feature Extraction in Reference Signal Received Power Prediction Based on Convolution Neural Networks" which is accepted by IEEE Communications Letters.
Please cite this paper if it helps your research work.
Y. Zheng, Z. Liu, R. Huang, J. Wang, W. Xie and S. Liu, "Feature Extraction in Reference Signal Received Power Prediction Based on Convolution Neural Networks," in IEEE Communications Letters, doi: 10.1109/LCOMM.2021.3054862.
(This is only for Early Access. It will be updated when the paper is printed.)
The RSRP dataset collected different base station (BS) antennas and Rx points covered by the BS. There are 15-D features in the dataset which are extracted from measured data.
The motivation of our work is how to improve the prediction accuracy of RSRP. Firstly, 15-D effective physical features (PFs) have been transformed and reduced to 14-D through the features analysis. By comparing traditional wireless propagation models, we have verified the effectiveness of the extracted PFs. Secondly, we believe that environmental information can further improve the RSRP prediction accuracy. So we have generated the environmental maps (EMs) to reflect the Geo-Information, and proposed an environmental feature extraction method (EFEM) in which the DL-approach is introduced to achieve the extraction of environmental features (EFs) from the EMs. Finally, it has been verified that EFs can improve the prediction accuracy of RSRP.
The details of data pre-processing are presented in description file "data pre-processing.pdf".
In order to make it easier for more researchers to use this dataset, I have optimized the structure of dataset. If you want to start your project or research on it, you could follow this guide：
Step 1 Download file dataset.tar.gz. and python file read_dataset.py.
Step 2 Decompress dataset files. (Code: tar -zxvf dataset.tar.gz if you are using Linux)
Step 3 Revise variable dataset_path（Line 33 in read_dataset.py）to YOUR decompressed dataset file path (such as '~/project/data/' or './data/').
Step 4 Run python read_dataset.py
Note: Python 3.5.2.. You may need to modify some functions if you use other versions of Python.
1. Update the dataset to ".pickle" format, which could accelerate the files loading.
2. Update the code file "read_dataset.py", which speeds up files loading and add the simple statistics analysis of the dataset.
3. Add the code file "ML_model.py", which provides the examples of using machine learning model to predict RSRP. (Now support KNN, decision tree, SVM, neural network, etc.)
4. Complete the compatibility of Python 3.6, the code mentioned above can be running in Python 3.6.
5. According to the relevant policies and regulations, remove the measured map information from the dataset. If we need relevant data, we can provide coordinates after signing relevant confidentiality agreement and suggest to obtain map information from Google maps.
- The dataset file dataset.tar.gz (188.97 MB)
- read dataset read_dataset.py (377 bytes)
- tool function ult.py (4.65 kB)
- ML_model.py (2.28 kB)