BRURIIoT: A Dataset for Network Anomaly Detection in IIoT with an Enhanced Feature Engineering Approach

Citation Author(s):
FAHIM
AL ISLAM
Begum Rokeya University, Rangpur, Bangladesh
MD.
SHAMSUZZAMAN
Begum Rokeya University, Rangpur, Bangladesh
MD. SHOHANUR
ISLAM
Begum Rokeya University, Rangpur, Bangladesh
SHAHIDUL AHAD
SAKIB
Begum Rokeya University, Rangpur, Bangladesh
ABU SAYED MD. MOSTAFIZUR
RAHAMAN
Jahangirnagar University, Savar, Dhaka, Bangladesh
Submitted by:
Fahim Al Islam
Last updated:
Thu, 03/06/2025 - 14:10
DOI:
10.21227/fqqe-g413
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This paper presents an enhanced methodology for network anomaly detection in Industrial IoT (IIoT) systems using advanced data aggregation and Mutual Information (MI)-based feature selection. The focus is on transforming raw network traffic into meaningful, aggregated forms that capture crucial temporal and statistical patterns. A refined set of 150 features including unique IP counts, TCP acknowledgment patterns, and ICMP sequence ratios was identified using MI to enhance detection accuracy. The approach is experimented with our BRURIIoT dataset comprising over 59 million network packets condensed into 3 million records. Unlike existing datasets with limited attack diversity or missing key features, BRURIIoT preserves nuanced attack behaviors in real-world IIoT devices. SHapley Additive exPlanations (SHAP) analysis demonstrated the importance of aggregated features in model predictions. Machine learning classifiers, including Support Vector Machine (SVM), Gradient Boost, XGBoost, CatBoost, KNN, AdaBoost, Random Forest, Extra Trees, and a custom DNN model are trained on the aggregated data achieved outstanding performance with an accuracy of 99.52%, precision of 98.20% and recall of 98.13%, F1-score of 98.17%. These results were validated using K-fold cross-validation to verify their robustness and reliability. The outcome of this work presents an enabling framework for scaling IIoT cyberattack detection via the application of advanced aggregation and feature engineering towards developing interpretable, scalable, and effective cybersecurity solutions. The findings address the urgent need for robust anomaly detection techniques for modern IIoT environments.

Instructions: 

Download and Loading

  1. Download the Dataset:
    • The dataset is available as a single CSV file on IEEE DataPort.
  2. Loading in Python:

    • A sample code snippet to load the dataset is provided below:

      import pandas as pd
      
      # Load the aggregated dataset
      df = pd.read_csv('BRUIIoT.csv')
      
      # Display basic information
      print(df.info())
      print(df.head())
      
  3. Preprocessing:
    • Although the dataset is preprocessed and standardized, users may perform additional scaling or feature selection based on their modeling needs.
    • Label columns (is_attack, attack_label_enc, and attack_label) are included for supervised learning tasks.

Usage Considerations

  • Exploratory Data Analysis (EDA):
    • Investigate the distributions and relationships between features using visualization tools such as t-SNE, histograms, or pair plots.
  • Model Training:
    • Use standard machine learning libraries (e.g., scikit-learn, XGBoost, TensorFlow/Keras) to build and evaluate models for network anomaly detection.
  • Feature Importance:
    • The dataset’s rich feature set allows for detailed feature importance analysis. Methods such as SHAP (SHapley Additive exPlanations) are recommended for interpretability.

Example Applications

  • Intrusion Detection:
    • Train classifiers (e.g., SVM, Random Forest, XGBoost) to differentiate between normal and anomalous (attack) traffic.
  • Cybersecurity Research:
    • Evaluate the effectiveness of feature engineering techniques on real-world IIoT data.
  • Real-Time Monitoring:
    • Develop streaming or real-time anomaly detection systems using the dataset as a benchmark for performance.

Documentation

AttachmentSize
File Readme.pdf129.04 KB