The Development of an Internet of Things (IoT) Network Traffic Dataset with Simulated Attack Data.
Abstract— This research focuses on the requirements for and the creation of an intrusion detection system (IDS) dataset for an Internet of Things (IoT) network domain.
This dataset is a supplementary material for paper "A Comprehensive and Reproducible Comparison of Cryptographic Primitives Execution on Android Devices" with the measurements collected from 17 mobile devices and the code for reproducibility.
The primary data related to the collected data is located in folder Measurement and each device has the corresponding subfolder with the measurement file. The dataset consists of JSON files, each containing measurements of available devices' security primitives execution times. The data was gathered in a span of multiple 250 iterations. Each measurement was taken with a 50 repetitions interval for every primitive. We define the main components of the dataset in the following:
1) context – provides the details about the device and OS including device name, model, battery-related information, Software Development Kit~(SDK) version, and basic technical specification.
2) benchmarks – provides entries per primitive, such as:
i) name – the overall identification title of the primitive, including paddung and other optional fields;
ii) params – additional parameters unilized for the execution if any;
iii) totalRunTimeNs – the overall time of the primitive's execution time;
iv) metrics – provides entries per execution, such as:
(a) timeNs – the collected/processed information of the collected data inluding entries per execution in runs and statistical parameters in maximum, minimum, and median.
(b) warmupIterations – number of iterations of warmup before measurements started;
(c) repeatIterations – the number of iterations;
(d) thermalThrottleSleepSeconds – the duration of sleep due to thermal throttling.
An example of the dataset entry:
"model": "Ticwatch E",
"batteryCapacity, mAh": 300,
Note: Project group was supported by the Graduate School of Business National Research University Higher School of Economics.
This dataset comprises sensory data of in and out miniature vehicle (mobile sink) movement in the agriculture fields. The dataset is collected from the miniature vehicle using a 9-axis Inertial Measurement Unit (IMU) sensor (MPU-9250) placed on the top of the vehicle. Though the vehicle is small but designed to handle all the hurdles of the agricultural land, such as rough and muddy surface. This dataset aims to facilitate appropriate path planning in the agricultural field for the automatic cultivation of seeds, manure spread, and nutrients insertion.
The dataset contains Multivariate Time Series (MTS) of the miniature vehicle’s in and out movement in the agricultural field. The miniature vehicle collects the sensory data of the Inertial Measurement Unit (IMU) sensor (MPU-9250) deployed on it. MPU-9250 is a 9-axis sensor used for recording the linear and angular motion of the vehicle in the jerking condition due to the uneven surface of the farmland. MPU-9250 comprises a 3-axis accelerometer, 3-axis gyroscope, and 3-axis magnetometer. These sensors are connected to a NodeMCU with an attached SD card, which stores the data. The sensory data is collected from sixteen different agricultural fields at a sampling rate of 5 Hz for 5 minutes each. Therefore, each field produces 1500 instances of tri-axial sensors (accelerometer, gyroscope, and magnetometer). Hence, the total instances we have collected is 1500 X 16 =24000.
The provided dataset computes the exact analytical bit error rate (BER) of the NOMA system in the SISO broadcast channels with the assumption of i.i.d Rayleigh fading channels. The reader has to decide on the following input: 1) Number of users. 2) Modulation orders. 3) Power assignment. 4) Pathloss. 5) Transmit signal-to-noise ratio (SNR). The output is stored in a matrix where different rows are for different users while different columns are for different transmit SNRs.
Another raw ADS-B signal dataset with labels, the dataset is captured using a BladeRF2 SDR receiver @ 1090MHz with a sample rate of 10MHz
In order to obtain the constants of our PID temperature controller, it was necessary to identify the system. The identification of the system allows us, through experimentation, to find the representation of the plant to be able to control it.
The first data with name "data_2.mat" represent the open loop test, where the sampling frequency is 100 [Hz], this data was useful to find the period of the pulse train generator, which is twice the slowest sampling time analyzed between the high pulse and low pulse of the input.
Dataset with diverse type of attacks in Programmable Logic Controllers:
1- Denial of Service
2- Man in the Middle
The full documentation of the dataset is available at: https://arxiv.org/abs/2103.09380
The dataset if composed of several files regarding the DoS attacks and MiTM attacks.
A sample CSV file is also provided to illustrate the contents of the collected data. The majority of data is available at pcap format.
Full instructions are available at: https://arxiv.org/abs/2103.09380
Datasets as described in the research paper "Intrusion Detection using Network Traffic Profiling and Machine Learning for IoT Applications".There are two main dataset provided here, firstly is the data relating to the initial training of the machine learning module for both normal and malicious traffic, these are in binary visulisation format, compresed into the document traffic-dataset.zip.
Each dataset is provided in compressed ZIP files, no password protection is present and no malicious files are contained herein, only their network traffic and image representations relevant to the project.
Smart speakers and voice-based virtual assistants are core components for the success of the IoT paradigm. Unfortunately, they are vulnerable to various privacy threats exploiting machine learning to analyze the generated encrypted traffic. To cope with that, deep adversarial learning approaches can be used to build black-box countermeasures altering the network traffic (e.g., via packet padding) and its statistical information.
This dataset contains several pcap files generated by the Google Home smart speaker placed under different conditions.
- Mic_on_off_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively.
- Mic_on_off_gquic_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively, excluding all network traffic not belonging to the google: gquic protocol.
- Mic_on_off_noise_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days.
- Mic_on_off_noise_gquic_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days. excluding all network traffic not belonging to the google protocol: gquic.
- media_pcap_anonymized contains several pcap files after the execution of queries such as "Whats' the latest news?" or "Play some music" (On each file has been stored network traffic collected after the execution of one query).
- travel_pcap_anonymized contains several pcap files after the execution of queries such as "How is the weather today?" (On each file has been stored network traffic collected after the execution of one query).
- utilities_pcap_anonymized contains several pcap files after the execution of queries such as "What's on my agenda today?" or "What time is it?" (On each file has been stored network traffic collected after the execution of one query).
The dataset is collected for the purpose of investigating how brainwave signals can be used to industrial insider threat detection. The dataset was connected using Emotiv Insight 5 channels device. The dataset contains data from 17 subjects who accepted to participate in this data collection.