Datasets
Standard Dataset
DISC: a dataset for integrated sensing and communication in mmWave systems
- Citation Author(s):
- Submitted by:
- Jesus Lacruz
- Last updated:
- Wed, 01/22/2025 - 10:44
- DOI:
- 10.21227/2gm7-9z72
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
This dataset provides Channel Impulse Response (CIR) measurements from standard-compliant IEEE 802.11ay packets to validate Integrated Sensing and Communication (ISAC) methods. The CIR sequences contain reflections of the transmitted packets on people moving in an indoor environment. They are collected with a 60 GHz software-defined radio experimentation platform based on the IEEE 802.11ay Wi-Fi standard, which is not affected by frequency offsets by operating in full-duplex mode.
The dataset is divided into two parts:
1) The first one consists of almost 40 minutes of IEEE 802.11ay CIR sequences including signal reflections on 7 subjects performing 4 different activities, i.e., walking (A0), running (A1), sitting down-standing up (A2), and waving hands (A3). This part is characterized by uniform packet transmission times, with a granularity of over 3 CIR estimates per millisecond, yielding extremely high temporal resolution.
2) In the second part, we use open-source data on Wi-Fi traffic patterns to tune the inter-packet duration and collect more realistic sparse CIR sequences. The resulting CIR measurements, for a total of 9 minutes, are collected with a single subject performing the same 4 activities included in the first part. In the second part, we also use the directional transmission capabilities of our testbed to allow the estimation of the Angle of Arrival (AoA) of the reflections.
We envision our dataset being used by researchers to train and validate machine and deep learning algorithms for fine-grained sensing. Possible use cases include, but are not limited to, the extraction of the micro-Doppler signatures of human movement from the CIR, which enables deep learning-based human activity recognition, and person identification from individual gait features. In addition, new ISAC problems such as the sparse reconstruction of sensing parameters from irregularly sampled signal traces, domain adaptation from regularly sampled signals to sparse ones, and target tracking under missing measurements can also be tackled using the provided dataset.
Data format: Raw CIR measurements are provided as multi-dimensional arrays in MAT-file format (.mat). Specifically, each CIR sequence is stored in a separate .mat file, containing two fields named CIR and TIME (optional). CIR is a 3-dimensional array with shape (n_range_bins, n_packets, n_bp), which represent the number of range bins according to the range resolution of the system (512 in our case), the number of packets transmitted in the sequence, and the number of BPs used in each packet, respectively. TIME is an optional field. If used, it contains the time instant, in seconds, in which the corresponding packet was transmitted, relative to the beginning of the measurement. This is omitted in the uniform sequences, as they have fixed IFS, but it is present in the sparse measurements to provide additional information for sparse reconstruction algorithms. We also provide processed micro-Doppler signatures extracted from the CIR through spectral analysis. These are provided in Python's numpy array format (.npy), with shape (n_frames, n_freq) representing the number of time frames and frequency samples, respectively. The micro-Doppler spectrograms are obtained with a windowed short-time Fourier transform, with a time window of W = 64 samples, using a Hanning window. Subsequent windows overlap by 32 samples. The obtained spectra are then transformed to the Decibel (dB) domain and normalized in the range [0, 1] along the frequency axis.
Uniformly sampled CIR sequences:
These are contained in the directory data/uniform_7subj. The CIR sequences for each subject are contained in the folder raw_data, and divided in subfolders named PERSON<id> where <id> \in {1, . . . , 7}. Then, for each subject, sequences are named after the activity performed in that measurement as <activity>_<id>_<index>.mat, where <activity> is one between WALKING (A0), RUNNING (A1), SITTING (A2) or HANDS (A3), and <index> is an incremental integer identifying a specific measurement among those of the same subject and activity. Processed micro-Doppler spectrograms are contained in the folder micro_doppler_stft and they share the same file name as the raw CIR sequence from which they are extracted.
Sparse CIR sequences:
1) Sparse traffic patterns: We provide .txt files containing the packet transmission instants, in seconds, which can be found in the folder info/sampling_patterns. Each file is named as <M>_<pattern>.txt where <M> is the maximum number of packets per window transmitted in that sequence and <pattern> is one of psu_cs, library, and ug. A complete description of the design of such sampling patterns can be found in reference [1] below.
2) CIR sequences: We provide 8.7 s long traces for each of 3 sequence types (psu_cs, library, and ug). Experiments are repeated for different values of the maximum number of packets per window, setting it to M = 4, 8, 16, 32, 64. Measurements are contained in the folder raw_data and named according to the following convention TEST<id>_<r>_<code>_F2.mat. <id> is an incremental identifier for each measurement, <r> is an incremental integer identifying repetitions of the same sequence. <code> contains a code in the form M<i>, with <i> being an integer mapping the measurement to the corresponding inter-packet times, specified by the value of M for the injections and the traffic pattern used. Code US means the corresponding tests are performed with uniform inter-packet spacing with T = 0.27 ms. There are 4 tests collected with the traffic pattern associated with each code, containing the 4 activities A1 − 4. experiments_sparse.xls contains the associations between the test identifiers and the code specifying the inter-packet patterns, and the activities performed in each test.
3) Micro-Doppler spectrograms: These are obtained from sparse CIR sequences using the method proposed in [1], which is based on the Iterative Hard Thresholding algorithm. They are contained in the folder micro_doppler_iht, named according to the convention test_<id>.npy. Useful code examples are contained in files example_use_unif.py and example_use_sparse.py. These perform the loading of the CIR data and micro-Doppler extraction on the uniformly sampled data, and people tracking using delay and angle-of-arrival measurements extracted from the CIR on the sparse data, respectively.
The full software to replicate the testbed setup used for the experiments can be found under Testbed.
[1] J. Pegoraro, J. O. Lacruz, M. Rossi, and J. Widmer, "SPARCS: A Sparse Recovery Approach for Integrated Communication and Human Sensing in mmWave Systems", in ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), (Milan, Italy), May 2022.
The creation of this dataset has been supported by the European Union’s EU Framework Programme for Research and Innovation Horizon 2020 under Grant Agreement No 861222 – MINTS: “Millimeter-wave networking and sensing for beyond 5G”.
Dataset Files
- uniform_7subj.zip (59.38 GB)
- sparse_1subj.zip (119.53 GB)
- disc_a.tar.gz (59.50 GB)
- scripts_DISC.zip (399.78 MB)
Comments
Many thanks for your efforts in providing this helpful dataset.
I have employed this dataset in my work to train the DQN utilized in several on-policy DRL algorithms such as A2C combined with greedy policy and I got great results. Thank you for sharing such useful information.
This dataset is also suitable to be developed through CGANs.