Datasets
Standard Dataset
IoT Data from IoT-23, NSL-KDD, and TON_IoT
- Citation Author(s):
- Submitted by:
- Maria Balega
- Last updated:
- Mon, 07/08/2024 - 15:58
- DOI:
- 10.21227/yjtm-6x74
- License:
- Categories:
- Keywords:
Abstract
As the Internet of Things (IoT) continues to evolve, securing IoT networks and devices remains a continuing challenge.The deployment of IoT applications makes protection more challenging with the increased attack surfaces as well as the vulnerable and resource-constrained devices. Anomaly detection is a crucial procedure in protecting IoT. A promising way to perform anomaly detection on IoT is through the use of machine learning algorithms. There is a lack in the literature to identify the optimal (with regard to both effectiveness and efficiency) anomaly detection models for IoT. To fill the gap, this work thoroughly investigated the effectiveness and efficiency of XGBoost in IoT anomaly detection and compared it with the well-known learning models, Support Vector Machines (SVM) and Deep Convolutional Neural Networks (DCNN). Identifying the optimal anomaly detection models for IoT is highly challenging due to diverse IoT applications and dynamic IoT networking environments. It is of vital importance to evaluate the ML powered anomaly detection models using multiple datasets collected from different environments. We utilized three well-known datasets to benchmark the aforementioned machine learning methods, namely, IoT-23, NSL_KDD, and TON_IoT. Our results show that XGBoost outperformed both SVM and DCNN achieving accuracies up to 99.98%. Moreover, XGBoost proved to be the most computationally efficient method where the model performed 717.75 times faster than SVM and significantly faster than DCNN in terms of training times. The research results have been further confirmed by using our real-world IoT data collected from an IoT testbed consisting of physical devices that we recently built. Our evaluation of the anomaly detection models using the real-world data proves that XGBoost can be used to efficiently and accurately detect anomalies in real-world IoT applications.
The uploaded datasets contain IoT traffic from dynamic environments which can be used for IoT anomaly detection. The data is preprocessed to run in machine learning algorithms (specifically XGBoost, SVM, and DCNN). Users can utilize this data by specifying the path to the data file in order to read it in as a csv file. Using machine learning algorithms, this can be done in the following line of code:
data = pd.read_csv('/Documents/KDD.csv')
Users can specify the path to the file and change the filename as necessary. The IoT23_XGB.csv is the preprocessed IoT23 data which can be run in XGBoost. The IoT23_SVM_DCNN.csv is the IoT23 data which can be run in SVM and DCNN. The NSL-KDD.csv is the preprocessed NSL-KDD which can be run in XGBoost, SVM, and DCNN. The TON_IoT_Mean.csv is the TON_IoT preprocessed data which can be run in XGBoost, SVM, and DCNN. Once the dataset is specifed, the algorithm will run to produce results which can then be analyzed. Links to the original datasets can be found below:
https://www.stratosphereips.org/datasets-iot23
Comments
i want this to train a model