Datasets
Standard Dataset
JamShield Dataset
- Citation Author(s):
- Submitted by:
- Yagmur Yigit
- Last updated:
- Thu, 12/05/2024 - 11:24
- DOI:
- 10.21227/5hzf-w161
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
This dataset used in the research paper "JamShield: A Machine Learning Detection System for Over-the-Air Jamming Attacks." The research was conducted by Ioannis Panitsas, Yagmur Yigit, Leandros Tassiulas, Leandros Maglaras, and Berk Canberk from Yale University and Edinburgh Napier University.
For any inquiries, please contact Ioannis Panitsas at ioannis.panitsas@yale.edu.
Threat Model
In this work, we assume that a powerful jammer disrupts all types of communications within a particular frequency range, affecting all three 802.11 user channels (channels 1, 6, and 11) in the 2.4 GHz band by broadcasting Additive White Gaussian Noise (AWGN) or by emitting single tones and pulses. Additionally, we assume that this malicious signal disrupts communications among nodes within the jamming radius, partially disabling their communication ability.
Dataset
Each dataset corresponds to a specific jamming type. We have implemented three types of jammers: constant, random, and reactive, each with varying output power and different jamming signals. We also include datasets without the presence of a jammer. You can access the raw files in the data
folder:
Each dataset file contains various features, starting with a unique identifier for each data sample labelled as "sample." The "station" feature refers to the MAC Address of the station transmitting the data. Several features track the transmitted data, including "tx_total_pkts" for the total number of transmitted packets and "tx_total_bytes" for the total bytes transmitted. Unicast-specific features include "tx_ucast_pkts," which counts the number of unicast packets transmitted, and "tx_ucast_bytes," which represents the total bytes of unicast packets transmitted. For multicast and broadcast transmissions, "tx_mcast_bcast_pkts" counts the number of multicast and broadcast packets, while "tx_mcast_bcast_bytes" represents the total bytes of such packets. The dataset also tracks transmission errors through the "tx_failures" feature, which records the number of transmission failures.
On the receiving side, "rx_data_pkts" counts the received data packets, and "rx_data_bytes" represents the total bytes received in data packets. There are also unicast-specific receiving features like "rx_ucast_pkts" for the number of unicast packets received and "rx_ucast_bytes" for the corresponding bytes. Multicast and broadcast receiving features include "rx_mcast_bcast_pkts" and "rx_mcast_bcast_bytes," which track the number of packets and bytes received, respectively. The dataset includes decryption success data with the "rx_decrypt_succeeds" feature, which counts the number of successful decryption attempts. Retransmission statistics are also present with "tx_data_pkts_retried" indicating the number of retried data packets and "tx_total_pkts_sent" showing the total number of packets sent, including retransmissions. Additional retry statistics include "tx_pkts_retries" for total retries and "tx_pkts_retry_exhausted" for the number of packets that exceeded their retry limit. For the reception, "rx_total_pkts_retried" tracks the number of packets retried during the reception.
The dataset also includes transmission rate data, with "rate_last_tx_pkt_min" and "rate_last_tx_pkt_max" representing the minimum and maximum transmission rates for the last transmitted packet, respectively. Signal strength and noise floor measurements are available for each antenna. For example, "per_antenna_rssi_last_rx_data_frame_1" to "per_antenna_rssi_last_rx_data_frame_4" record the Received Signal Strength Indicator (RSSI) for the last received data frame across four antennas, while "per_antenna_avg_rssi_rx_data_frames_1" to "per_antenna_avg_rssi_rx_data_frames_4" represent the average RSSI for all received data frames per antenna. Noise floor levels are captured in "per_antenna_noise_floor_1" through "per_antenna_noise_floor_4." Additionally, the dataset measures Signal-to-Interference-plus-Noise Ratio (SINR) for each antenna using "sinr_per_antenna_1" to "sinr_per_antenna_4."
Finally, the "attack" feature indicates whether an attack is present, with 0 representing normal operation and 1 representing an attack.
Feature Selection
We employed two key techniques to identify the most important features from the initial dataset of 40 features collected from our testbed: Principal Component Analysis (PCA) and Mutual Information (MI). PCA was used to transform the 40 original features into a smaller set of principal components, capturing the most significant variance in the data. Meanwhile, MI assessed the relevance of each feature to the target variable (attack presence), highlighting the most critical factors for detection. The results from both methods were integrated using a weighted voting mechanism, which ultimately selected the 20 most relevant features for jamming attack classification.
The final set of features selected for jamming attack classification includes several key indicators related to transmission and reception data. These begin with "tx_total_pkts," which tracks the total number of packets transmitted, and "tx_total_bytes," which records the total number of bytes transmitted. Unicast-specific transmission features include "tx_ucast_pkts" for the number of unicast packets transmitted and "tx_ucast_bytes" for the total bytes of unicast packets transmitted. "Tx_failures" captures the number of transmission failures, while on the receiving side, "rx_data_pkts" and "rx_ucast_pkts" record the total number of data packets and unicast packets received, respectively. Additionally, "rx_data_bytes" measures the total bytes received in data packets.
The dataset also monitors retransmissions through features such as "tx_data_pkts_retried," which indicates the number of data packets that required retransmission, and "tx_total_pkts_sent," which captures the total number of packets sent, including retransmissions. Further details on retransmission attempts include "tx_pkts_retries" for the total number of packet retransmission attempts, and "tx_pkts_retry_exhausted" for the number of packets that exhausted all retry attempts without success.
Transmission rates are recorded with "rate_last_tx_pkt_min" and "rate_last_tx_pkt_max," representing the minimum and maximum transmission rates of the last transmitted packet, respectively. Signal strength is assessed through the Received Signal Strength Indicator (RSSI) for antenna 1 and antenna 2, with "per_antenna_rssi_last_rx_data_frame_1" and "per_antenna_rssi_last_rx_data_frame_2" measuring the RSSI for the last received data frame on these antennas, while "per_antenna_avg_rssi_rx_data_frames_1" and "per_antenna_avg_rssi_rx_data_frames_2" capture the average RSSI for all received data frames on each antenna.
Finally, the Signal-to-Interference-plus-Noise Ratio (SINR) for antenna 1 is provided by the "sinr_per_antenna_1" feature, and the noise floor measurement for antenna 1 is tracked by the "per_antenna_noise_floor_1" feature, both crucial for evaluating signal quality and interference in jamming attack detection.
Implementation (Flow Graph)
The jammers were implemented using GNURadio.
Testbed Setup
Our experimental setup is deployed in an area of 80 m². It includes three fixed-position wireless nodes, two OnePlus 8T smartphones, one wireless access point (AP), one USRP X310, and one edge server (for training/inference of our ML model).
-
Wireless Nodes: Each wireless node is a Linux PC with an Intel WLAN 8265/8275 wireless network adapter supporting 802.11 standards.
-
OnePlus 8T Smartphones: Each smartphone is equipped with the Qualcomm FastConnect 6900 Wi-Fi chipset. During our experiments, the smartphones were placed in various locations within the lab at random intervals.
-
Wireless Access Point (AP): We utilized the ASUS RT-AX88U Pro as the AP in this work. This device supports wireless standards, including 802.11, and operates simultaneously in dual-band (2.4 GHz and 5 GHz) modes. It features 2x2 antennas for enhanced beamforming and supports channel capacities ranging from 20 to 160 MHz, with a maximum output power of 20 dBm.
-
USRP X310: We employed the USRP X310 radio from Ettus Research to generate and transmit malicious interference signals. This open-source Software Defined Radio (SDR) platform features two extended-bandwidth daughter-board slots, supporting frequencies ranging from 10 MHz to 6 GHz. It is equipped with two individually configurable RF channels, each capable of operating at a maximum sample rate of 200 Msps and providing an adequate bandwidth of 160 MHz. Additionally, it has a maximum output power exceeding 20 dBm.
-
Edge Server: For the training and inference of our proposed JamShield ML-based model, we utilized a customized edge server equipped with an AMD EPYC 7352 2.3 GHz 24-core processor, 128 GB of DDR4 RAM, and four NVIDIA RTX A5000 GPUs, each with 24 GB of memory.
Reference Scenarios
To mimic both ideal and challenging link characteristics, we considered two different configurations, referred to as the Line Of Sight (LOS) scenario and the Non-Line Of Sight (NLOS) scenario. In the first scenario, we deployed the jammer in the middle of the lab at a height of three meters, with the wireless nodes positioned around the lab at a distance of six meters and at a height of one meter. The jammer maintained a LOS with the wireless nodes, allowing it to generate interference signals that propagated directly to them without obstruction or multipath reflection. In the second scenario, the jammer was placed in a different location, at a height of one meter. To create an NLOS condition, we obstructed the jammer’s signals with a metallic surface and reduced the transmit power by adding attenuation elements at the output of the radio front-end.
The GitHub link of the dataset: https://github.com/panitsasi/JamShield-Dataset