The Development of an Internet of Things (IoT) Network Traffic Dataset with Simulated Attack Data.
Abstract— This research focuses on the requirements for and the creation of an intrusion detection system (IDS) dataset for an Internet of Things (IoT) network domain.
The Internet of Things (IoT) is reshaping our connected world, due to the prevalence of lightweight devices connected to the Internet and their communication technologies. Therefore, research towards intrusion detection in the IoT domain has a lot of significance. Network intrusion datasets are fundamental for this research, as many attack detection strategies have to be trained and evaluated using these datasets.
This dataset is a supplementary material for paper "A Comprehensive and Reproducible Comparison of Cryptographic Primitives Execution on Android Devices" with the measurements collected from 17 mobile devices and the code for reproducibility.
The primary data related to the collected data is located in folder Measurement and each device has the corresponding subfolder with the measurement file. The dataset consists of JSON files, each containing measurements of available devices' security primitives execution times. The data was gathered in a span of multiple 250 iterations. Each measurement was taken with a 50 repetitions interval for every primitive. We define the main components of the dataset in the following:
1) context – provides the details about the device and OS including device name, model, battery-related information, Software Development Kit~(SDK) version, and basic technical specification.
2) benchmarks – provides entries per primitive, such as:
i) name – the overall identification title of the primitive, including paddung and other optional fields;
ii) params – additional parameters unilized for the execution if any;
iii) totalRunTimeNs – the overall time of the primitive's execution time;
iv) metrics – provides entries per execution, such as:
(a) timeNs – the collected/processed information of the collected data inluding entries per execution in runs and statistical parameters in maximum, minimum, and median.
(b) warmupIterations – number of iterations of warmup before measurements started;
(c) repeatIterations – the number of iterations;
(d) thermalThrottleSleepSeconds – the duration of sleep due to thermal throttling.
An example of the dataset entry:
"model": "Ticwatch E",
"batteryCapacity, mAh": 300,
Note: Project group was supported by the Graduate School of Business National Research University Higher School of Economics.
Dataset with diverse type of attacks in Programmable Logic Controllers:
1- Denial of Service
2- Man in the Middle
The full documentation of the dataset is available at: https://arxiv.org/abs/2103.09380
The dataset if composed of several files regarding the DoS attacks and MiTM attacks.
A sample CSV file is also provided to illustrate the contents of the collected data. The majority of data is available at pcap format.
Full instructions are available at: https://arxiv.org/abs/2103.09380
Datasets as described in the research paper "Intrusion Detection using Network Traffic Profiling and Machine Learning for IoT Applications".There are two main dataset provided here, firstly is the data relating to the initial training of the machine learning module for both normal and malicious traffic, these are in binary visulisation format, compresed into the document traffic-dataset.zip.
Each dataset is provided in compressed ZIP files, no password protection is present and no malicious files are contained herein, only their network traffic and image representations relevant to the project.
Smart speakers and voice-based virtual assistants are core components for the success of the IoT paradigm. Unfortunately, they are vulnerable to various privacy threats exploiting machine learning to analyze the generated encrypted traffic. To cope with that, deep adversarial learning approaches can be used to build black-box countermeasures altering the network traffic (e.g., via packet padding) and its statistical information.
This dataset contains several pcap files generated by the Google Home smart speaker placed under different conditions.
- Mic_on_off_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively.
- Mic_on_off_gquic_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively, excluding all network traffic not belonging to the google: gquic protocol.
- Mic_on_off_noise_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days.
- Mic_on_off_noise_gquic_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days. excluding all network traffic not belonging to the google protocol: gquic.
- media_pcap_anonymized contains several pcap files after the execution of queries such as "Whats' the latest news?" or "Play some music" (On each file has been stored network traffic collected after the execution of one query).
- travel_pcap_anonymized contains several pcap files after the execution of queries such as "How is the weather today?" (On each file has been stored network traffic collected after the execution of one query).
- utilities_pcap_anonymized contains several pcap files after the execution of queries such as "What's on my agenda today?" or "What time is it?" (On each file has been stored network traffic collected after the execution of one query).
This dataset is part of my Master's research on malware detection and classification using the XGBoost library on Nvidia GPU. The dataset is a collection of 1.55 million of 1000 API import features extract from jsonl format of the EMBER dataset 2017 v2 and 2018. All data is pre-processing, duplicated records are removed. The dataset contains 800,000 malware and 750,000 "goodware" samples.
* FEATURES *
Column name: sha256
Description: SHA256 hash of the example
Column name: appeared
Description: appeared date of the sample
Type: date (yyyy-mm format)
Column name: label
Description: specify malware or "goodware" of the sample
Type: 0 ("goodware") or 1 (malware)
Column name: GetProcAddress
Description: Most imported function (1st)
Type: 0 (Not imported) or 1 (Imported)
Column name: LookupAccountSidW
Description: Least imported function (1000th)
Type: 0 (Not imported) or 1 (Imported)
The full dataset features header can be downloaded at https://github.com/tvquynh/api_import_dataset/blob/main/full_dataset_fea...
All processing code will be uploaded to https://github.com/tvquynh/api_import_dataset/
Three well-known Border Gateway Anomalies (BGP) anomalies:
WannaCrypt, Moscow blackout, and Slammer, occurred in May 2017, May 2005, and January 2003, respectively.
The Route Views BGP update messages are publicly available from the University of Oregon Route Views Project and contain:
WannaCrypt, Moscow blackout, and Slammer: http://www.routeviews.org/routeviews/.
Raw data from the "route collector route-views2" are organized in folders labeled by the year and month of the collection date.
Complete datasets for WannaCrypt, Moscow blackout, and Slammer are available from the Route Views route collector route-views2 site:
University of Oregon Route Views Project: http://www.routeviews.org/routeviews/
Route Views Collector Map: http://www.routeviews.org/routeviews/index.php/map/
University of Oregon Route Views Archive Project: http://archive.routeviews.org/
MRT format RIBs and UPDATEs (quagga bgpd, from route-views2.oregon-ix.net): http://archive.routeviews.org/bgpdata/
The date of last modification and the size of the datasets are also included.
BGP update messages are originally collected in multi-threaded routing toolkit (MRT) format.
"Zebra-dump-parser" written in Perl is used to extract to ASCII the BGP updated messages.
The 37 BGP features were extracted using a C# tool to generate uploaded datasets (csv files).
Labels have been added based on the periods when data were collected.
As an alternative to classical cryptography, Physical Layer Security (PhySec) provides primitives to achieve fundamental security goals like confidentiality, authentication or key derivation. Through its origins in the field of information theory, these primitives are rigorously analysed and their information theoretic security is proven. Nevertheless, the practical realizations of the different approaches do take certain assumptions about the physical world as granted.
The data is provided as zipped NumPy arrays with custom headers. To load an file the NumPy package is required.
The respective loadz primitive allows for a straight forward loading of the datasets.
To load a file “file.npz” the following code is sufficient:
import numpy as np
measurement = np.load(’file.npz ’, allow pickle =False)
header , data = measurement [’header ’], measurement [’data ’]
The dataset comes with a supplementary script example_script.py illustrating the basic usage of the dataset.
Design and fabrication outsourcing has made integrated circuits vulnerable to malicious modifications by third parties known as hardware Trojan (HT). Over the last decade, the use of side-channel measurements for detecting the malicious manipulation of the chip has been extensively studied. However, the suggested approaches mostly suffer from two major limitations: reliance on trusted identical chip (e.i. golden chip); untraceable footprints of subtle hardware Trojans which remain inactive during the testing phase.
See the attached document.