Datasets
Standard Dataset
BRURIIoT: A Dataset for Network Anomaly Detection in IIoT with an Enhanced Feature Engineering Approach
- Citation Author(s):
- Submitted by:
- Fahim Al Islam
- Last updated:
- Thu, 03/06/2025 - 14:10
- DOI:
- 10.21227/fqqe-g413
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This paper presents an enhanced methodology for network anomaly detection in Industrial IoT (IIoT) systems using advanced data aggregation and Mutual Information (MI)-based feature selection. The focus is on transforming raw network traffic into meaningful, aggregated forms that capture crucial temporal and statistical patterns. A refined set of 150 features including unique IP counts, TCP acknowledgment patterns, and ICMP sequence ratios was identified using MI to enhance detection accuracy. The approach is experimented with our BRURIIoT dataset comprising over 59 million network packets condensed into 3 million records. Unlike existing datasets with limited attack diversity or missing key features, BRURIIoT preserves nuanced attack behaviors in real-world IIoT devices. SHapley Additive exPlanations (SHAP) analysis demonstrated the importance of aggregated features in model predictions. Machine learning classifiers, including Support Vector Machine (SVM), Gradient Boost, XGBoost, CatBoost, KNN, AdaBoost, Random Forest, Extra Trees, and a custom DNN model are trained on the aggregated data achieved outstanding performance with an accuracy of 99.52%, precision of 98.20% and recall of 98.13%, F1-score of 98.17%. These results were validated using K-fold cross-validation to verify their robustness and reliability. The outcome of this work presents an enabling framework for scaling IIoT cyberattack detection via the application of advanced aggregation and feature engineering towards developing interpretable, scalable, and effective cybersecurity solutions. The findings address the urgent need for robust anomaly detection techniques for modern IIoT environments.
Download and Loading
- Download the Dataset:
- The dataset is available as a single CSV file on IEEE DataPort.
-
Loading in Python:
-
A sample code snippet to load the dataset is provided below:
import pandas as pd # Load the aggregated dataset df = pd.read_csv('BRUIIoT.csv') # Display basic information print(df.info()) print(df.head())
-
- Preprocessing:
- Although the dataset is preprocessed and standardized, users may perform additional scaling or feature selection based on their modeling needs.
- Label columns (
is_attack
,attack_label_enc
, andattack_label
) are included for supervised learning tasks.
Usage Considerations
- Exploratory Data Analysis (EDA):
- Investigate the distributions and relationships between features using visualization tools such as t-SNE, histograms, or pair plots.
- Model Training:
- Use standard machine learning libraries (e.g., scikit-learn, XGBoost, TensorFlow/Keras) to build and evaluate models for network anomaly detection.
- Feature Importance:
- The dataset’s rich feature set allows for detailed feature importance analysis. Methods such as SHAP (SHapley Additive exPlanations) are recommended for interpretability.
Example Applications
- Intrusion Detection:
- Train classifiers (e.g., SVM, Random Forest, XGBoost) to differentiate between normal and anomalous (attack) traffic.
- Cybersecurity Research:
- Evaluate the effectiveness of feature engineering techniques on real-world IIoT data.
- Real-Time Monitoring:
- Develop streaming or real-time anomaly detection systems using the dataset as a benchmark for performance.
Documentation
Attachment | Size |
---|---|
129.04 KB |