Datasets
Standard Dataset
A large dataset of 609,934 real Modbus packets
- Citation Author(s):
- Submitted by:
- Wael Maged Badawy
- Last updated:
- Tue, 04/01/2025 - 12:20
- DOI:
- 10.21227/krx4-9k68
- Data Format:
- License:
Abstract
This dataset contains 609,934 real Modbus TCP packets collected from industrial control system (ICS) environments, capturing the full byte-level structure of Modbus communication, including MBAP headers and function-specific payloads. Designed to support research in industrial cybersecurity, this dataset addresses the scarcity of diverse and realistic Modbus traffic, which often hampers the development of intrusion detection systems (IDS) and protocol-compliant synthetic data generators.
The dataset serves as the training foundation for a novel synthetic data generation model based on Wasserstein GANs (WGAN) with Gumbel-Softmax sampling. It preserves the authentic byte distribution, structure, and variability of real-world Modbus interactions, enabling high-fidelity replication. Synthetic packets trained on this dataset achieved a 93.92% Test System Reception Rate (TSRR) on a live Modbus server and demonstrated strong alignment with the original data distribution, confirming its suitability for both statistical modeling and operational validation.
This resource is ideal for researchers developing anomaly detection, adversarial testing, and protocol fuzzing systems in resource-constrained or privacy-sensitive ICS environments.
A large dataset of 609,934 real Modbus packets