Datasets
Standard Dataset
Dataset for Image-Based Traffic Classification in SDN
- Citation Author(s):
- Submitted by:
- Hicham Yzzogh
- Last updated:
- Sun, 05/12/2024 - 12:16
- DOI:
- 10.21227/722d-7p84
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Flow to image conversion is a pivotal preprocessing step in intrusion detection systems (IDS) where the representation of network flow data significantly influences classifier performance. In this study, we explore the effects of three distinct methods of transforming flow data into images on classifier performance. Leveraging a subset of the InSDN Dataset encompassing five types of attacks (DoS, DDoS, Probe, Normal, and BFA), we compare the efficacy of three methodologies: Method 1 involves converting each row (flow) into a bar chart, where the values are normalized and rendered as a Matplotlib-generated image. This approach excludes the target variable containing the label from the conversion process. Method 2 utilizes the Image Generator for Tabular Data (IGTD) framework based on Euclidean distance. IGTD transforms tabular data into grayscale images, optimizing spatial dependencies crucial for Convolutional Neural Networks (CNNs) by aligning feature and pixel distance rankings. Through iterative optimization, IGTD selects features to minimize discrepancies between rankings, positioning similar features close together in the resultant image. Method 3 extends the IGTD approach but relies on Manhattan distance for feature alignment and image generation. By evaluating the performance of classifiers trained on images generated by these methods, we aim to discern the impact of different flow-to-image conversion techniques on classifier accuracy, particularly in the context of intrusion detection. Our findings shed light on the suitability of each method for enhancing classifier performance in IDS applications, contributing to the optimization of network security systems.
The dataset contains a CSV file and three folders, each containing images converted using one of the conversion methods: flow-to-bar charts, IGTD based on Euclidean distance, or IGTD based on Manhattan distance. Each of these folders contains five subfolders, with each subfolder containing specific attack types.
Comments
Good comparison.