Statistical analysis ToN_IoT Datasets
The Internet of Things (IoT) is reshaping our connected world, due to the prevalence of lightweight devices connected to the Internet and their communication technologies. Therefore, research towards intrusion detection in the IoT domain has a lot of significance. Network intrusion datasets are fundamental for this research, as many attack detection strategies have to be trained and evaluated using these datasets. In this paper, we introduce the description, statistical analysis, and machine learning evaluations of the IoT dataset, the so-called ToN\_IoT, and compare it to other recent datasets. This comparison not only shows the importance of heterogeneity within these datasets, but also why even the slightest differences between datasets can have a huge impact on industry applications. In a cross-training experiment, we show that the inclusion of different data collection methods and a large diversity of the monitored features is of crucial importance for IoT network intrusion datasets to be useful for the industry. We also explain that the practical application of IoT datasets in operational environments requires the standardization of feature descriptions and cyberattack classes. This can only be achieved with a joint effort from the research community to start creating such standards.
The Python and R scripts in the files will create the datasets. Required is to also have the original ToN_IoT datasets.
- Script to perform the cross-training experiment cross-training_experiment.py (9.17 kB)
- Script to perform statistical analysis statistical_analysis.py (10.70 kB)
- Script to perform basic classification experiment classification-analysis.R (4.03 kB)