synthetic data

Anonymous network traffic is more pervasive than ever due to the accessibility of services such as virtual private networks (VPN) and The Onion Router (Tor). To address the need to identify and classify this traffic, machine and deep learning solutions have become the standard. However, high-performing classifiers often scale poorly when applied to real-world traffic classification due to the heavily skewed nature of network traffic data.


We present below a sample dataset collected using our framework for synthetic data collection that is efficient in terms of time taken to collect and annotate data, and which makes use of free and open source software tools and 3D assets. Our approach provides a large number of systematic variations in synthetic image generation parameters. The approach is highly effective, resulting in a deep learning model with a top-1 accuracy of 72% on the ObjectNet data, which is a new state-of-the-art result.


This dataset is related to the paper “Quantification of feature importance in automatic classification of power quality distortions” (IEEE International Conference on Harmonics and Quality of Power, March 2020). It includes the features extracted from synthetic signals with power quality distortions obtained from a public model (doi: 10.1109/ICHQP.2018.8378902).