TestCloudIDS dataset

Citation Author(s):
LALIT
Vashishtha
NIT PATNA
Kakali
Chatterjee
NIT PATNA
Submitted by:
Lalit Vashishtha
Last updated:
Wed, 12/25/2024 - 10:11
DOI:
10.21227/xk1x-zt89
Data Format:
License:
95 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

A key challenge in cybersecurity is the absence of a large-scale network dataset that accurately captures modern traffic patterns, diverse intrusion types, and comprehensive network activity. Existing benchmark datasets such as KDDCup99, NSL-KDD, GureKDD, and UNSW-NB15 require updates to reflect contemporary cyberattack signatures effectively.

To address this gap, we introduce the TestCloudIDS dataset, a new labeled dataset featuring fifteen variants of DDoS attacks in a cloud environment. Unlike existing datasets that often lack realism and fail to encompass the latest attack strategies, TestCloudIDS is meticulously designed to mirror real-world scenarios. It incorporates a broad spectrum of attack situations using both traditional and modern vectors, emphasizing state-of-the-art techniques like Raven Storm.

Instructions: 

Developing robust and comprehensive datasets in the rapidly evolving field of cybersecurity is essential for advancing research and innovation, particularly in intrusion detection. To address the pressing need for such resources, we created a novel dataset, TestCloudIDS, at the National Institute of Technology, Patna, India. This dataset is specifically designed to enhance intrusion detection using machine learning techniques. Figure 1 presents a meticulously designed schematic diagram that illustrates the conceptual framework underlying the dataset's creation. The dataset's development was structured into three critical phases, each carefully defined to ensure its effectiveness.

o    Testbed Creation and Setup

o    Data Acquisition

o  Data Preprocessing

Every step was meticulously executed to ensure the dataset's accuracy and relevance to cybersecurity. This study outlines these steps and details the methods employed for data collection. Our objective is to offer a dataset that not only supports ongoing research but also sparks innovative approaches in intrusion detection. The testbed used for capturing the dataset is illustrated in Figure 1.

The foundation of our data collection process is a carefully designed testbed that simulates realistic network intrusion scenarios. This setup includes XAMPP servers hosting various web services, such as website interfaces, an SDL database, and Perl scripts, to create a complex network environment for testing. Two attacker machines—Windows 7 and Kali Linux—were used to generate authentic attack data, simulating diverse cyber threats. An Ubuntu machine served as the victim, receiving these attacks. The interactions between the attackers, the victim, and regular network traffic were captured in packet capture (PCAP) files. This configuration supports a wide range of intrusion simulations and ensures comprehensive data collection, which is crucial for developing robust intrusion detection models.

As illustrated in Figure 1, the testbed for IDS dataset generation provides a controlled environment for simulating various network activities, including normal behavior and different attack types. The objective is to create a dataset suitable for training and evaluating IDS algorithms effectively.

In this study, each attack type was systematically captured and organized using a structured approach that combined isolated, session-based execution with real-time monitoring and precise labeling. Attacks were carried out on separate dates and in distinct sessions, ensuring clear segregation and accurate identification of attack types. Network packets were recorded during these sessions using Wireshark, with predefined IP addresses assigned to the attacker and victim and specific port IDs employed to ensure precise tracking of each attack.

 

 

To get the  password for the archive dataset file, you will enrol yourself by sending the following details to lalitkvashishtha@gmail.com.

1.     Name

2.     Affiliation

3.     Research Interest

4.     Current Qualitication

5.     Country

 

Data Descriptor Article DOI: 

Comments

The dataset is for IDS modeling for researchers of information security community.

Submitted by Lalit Vashishtha on Wed, 12/25/2024 - 10:15

Documentation

AttachmentSize
File A new dataset.docx167.14 KB