Datasets
Standard Dataset
SDN_Intrusion
- Citation Author(s):
- Submitted by:
- xiaojing yang
- Last updated:
- Thu, 12/12/2024 - 03:55
- DOI:
- 10.21227/0z26-3g86
- License:
- Categories:
- Keywords:
Abstract
Intrusion Detection Systems and Prevention Systems are the most important defence tools that facilitate the network users to get rid of online threats. Because of the growing technology, the demand for the network has been increased. With the implication of IoT, Cloud and SDN, the users and the organization are highly facilitated with the accessing of the service and the data as per their requirement. However, besides the facility of those networks, there are some drawbacks due to the online threats. Cybercriminals use to inject malicious traffic in the SDN to steal sensitive information from there. The network attack in the SDN can be detected using traffic monitoring. The selected data contain the record of the real-time traffic that has been captured on daily basis. The data originally belongs to the Packet Capture file or PCAP and later it was converted to a tabular file.
The data contains 79 quantitative and qualitative features out of which 1 feature represent the qualitative attributes and 78 features represent the quantitative attributes. This data will be used for analytical purposes and to detect network intrusion. The total data has been obtained into several segments that contain different types of network traffic
Out of all those types of network traffic data, a certain dataset has been chosen that contains the records of DDoS, XSS Intrusion, Brute Force Intrusion, SQL Injection and Bening traffic. The selected dataset contains 1188333 rows of observation of the network intrusion and whitelisted traffics along with 79 features.
Data preprocessing serves as the essential foundational work in the experiment, and the preprocessing procedures for both data sets are identical. The specific steps are outlined as follows: 1) Data set loading and cleansing: eliminate default values, NaN, and Infinity values. 2) Normalization and scaling: the data is standardized to ensure that each feature dimension is on a comparable scale, thereby mitigating data bias during model training. 3) Feature selection: Principal Component Analysis (PCA) is utilized for feature selection to reduce the dimensionality of the data while preserving critical information, ultimately enhancing model efficiency and performance.