The Bot-IoT dataset
The proliferation of IoT systems, has seen them targeted by malicious third parties. To address this challenge, realistic protection and investigation countermeasures, such as network intrusion detection and network forensic systems, need to be effectively developed. For this purpose, a well-structured and representative dataset is paramount for training and validating the credibility of the systems. Although there are several network datasets, in most cases, not much information is given about the Botnet scenarios that were used. This paper proposes a new dataset, so-called Bot-IoT, which incorporates legitimate and simulated IoT network traffic, along with various types of attacks. We also present a realistic testbed environment for addressing the existing dataset drawbacks of capturing complete network information, accurate labeling, as well as recent and complex attack diversity. Finally, we evaluate the reliability of the BoT-IoT dataset using different statistical and machine learning methods for forensics purposes compared with the benchmark datasets. This work provides the baseline for allowing botnet identification across IoT-specific networks.
he BoT-IoT dataset was created by designing a realistic network environment in the Cyber Range Lab of The center of UNSW Canberra Cyber, as shown in Figure 1. The environment incorporates a combination of normal and botnet traffic. The dataset’s source files are provided in different formats, including the original pcap files, the generated argus files and csv files. The files were separated, based on attack category and subcategory, to better assist in labeling process. The captured pcap files are 69.3 GB in size, with more than 72.000.000 records. The extracted flow traffic, in csv format is 16.7 GB in size. The dataset includes DDoS, DoS, OS and Service Scan, Keylogging and Data exfiltration attacks, with the DDoS and DoS attacks further organized, based on the protocol used. To ease the handling of the dataset, we extracted 5% of the original dataset via the use of select MySQL queries. The extracted 5%, is comprised of 4 files of approximately 1.07 GB total size, and about 3 million records.