Smart speakers and voice-based virtual assistants are core components for the success of the IoT paradigm. Unfortunately, they are vulnerable to various privacy threats exploiting machine learning to analyze the generated encrypted traffic. To cope with that, deep adversarial learning approaches can be used to build black-box countermeasures altering the network traffic (e.g., via packet padding) and its statistical information.

Instructions: 

This dataset contains several pcap files generated by the Google Home smart speaker placed under different conditions.

  • Mic_on_off_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively.
  • Mic_on_off_gquic_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively, excluding all network traffic not belonging to the google: gquic protocol.
  • Mic_on_off_noise_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days.
  • Mic_on_off_noise_gquic_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days. excluding all network traffic not belonging to the google protocol: gquic.
  • media_pcap_anonymized contains several pcap files after the execution of queries such as "Whats' the latest news?" or "Play some music" (On each file has been stored network traffic collected after the execution of one query).
  • travel_pcap_anonymized contains several pcap files after the execution of queries such as "How is the weather today?" (On each file has been stored network traffic collected after the execution of one query).
  • utilities_pcap_anonymized contains several pcap files after the execution of queries such as "What's on my agenda today?" or "What time is it?" (On each file has been stored network traffic collected after the execution of one query).
Categories:
41 Views

This dataset is part of my Master's research on malware detection and classification using the XGBoost library on Nvidia GPU. The dataset is a collection of 1.55 million of 1000 API import features extract from jsonl format of the EMBER dataset 2017 v2 and 2018. All data is pre-processing, duplicated records are removed. The dataset contains 800,000 malware and 750,000 "goodware" samples.

Instructions: 

* FEATURES *

Column name:  sha256

Description: SHA256 hash of the example

Type: string

 

Column name:  appeared

Description: appeared date of the sample

Type: date (yyyy-mm format)

 

Column name:  label

Description: specify malware or "goodware" of the sample

Type: 0 ("goodware") or 1 (malware)

 

Column name: GetProcAddress

Description: Most imported function (1st)

Type: 0 (Not imported) or 1 (Imported)

 

...

Column name: LookupAccountSidW

Description: Least imported function (1000th)

Type: 0 (Not imported) or 1 (Imported)

 

The full dataset features header can be downloaded at https://github.com/tvquynh/api_import_dataset/blob/main/full_dataset_fea...

All processing code will be uploaded to https://github.com/tvquynh/api_import_dataset/

Categories:
12342 Views

As an alternative to classical cryptography, Physical Layer Security (PhySec) provides primitives to achieve fundamental security goals like confidentiality, authentication or key derivation. Through its origins in the field of information theory, these primitives are rigorously analysed and their information theoretic security is proven. Nevertheless, the practical realizations of the different approaches do take certain assumptions about the physical world as granted.

Instructions: 

The data is provided as zipped NumPy arrays with custom headers. To load an file the NumPy package is required.

The respective loadz primitive allows for a straight forward loading of the datasets.

To load a file “file.npz” the following code is sufficient:

import numpy as np

measurement = np.load(’file.npz ’, allow pickle =False)

header , data = measurement [’header ’], measurement [’data ’]

The dataset comes with a supplementary script example_script.py illustrating the basic usage of the dataset.

Categories:
64 Views

The emerging 5G services offer numerous new opportunities for networked applications. In this study, we seek to answer two key questions: i) is the throughput of mmWave 5G predictable, and ii) can we build "good" machine learning models for 5G throughput prediction? To this end, we conduct a measurement study of commercial mmWave 5G services in a major U.S. city, focusing on the throughput as perceived by applications running on user equipment (UE).

Instructions: 

DATASET WEBSITE: https://lumos5g.umn.edu/

## OVERVIEW

Lumos5G 1.0 is a dataset that represents the `Loop` area of the IMC'20 paper - "Lumos5G: Mapping and Predicting Commercial mmWave 5G Throughput". The Loop area is a 1300 meter loop near U.S. Bank Stadium in Minneapolis downtown area that covers roads, railroad crossings, restaurants, coffee shops, and recreational outdoor parks.

This dataset is being made available to the research community.

## DATASET COLUMNS AND DESCRIPTION

The description of the columns in the dataset CSV, from left to right, are:

- `run_num`: Indicates the run number. For each trajectory and mobility mode, we conduct several runs of experiments.
- `seq_num`: This is the sequence number. For each run, the sequence number acts like an index or a per-second timeline.
- `abstractSignalStr`: Indicates the abstract signal strength as reported by Android API (https://developer.android.com/reference/android/telephony/SignalStrength...()). No matter whether the UE was connected to 5G service or not, this column always reported a value associated with the LTE/4G radio. Note, if one is interested to understand the signal strength values related to 5G-NR, we refer them to other columns such as `nr_ssRsrp`, `nr_ssRsrq`, and `nr_ssSinr`.
- `latitude`: The latitude in degrees as reported by Android's API (https://developer.android.com/reference/android/location/Location#getLat...()).
- `longitude`: The longitude in degrees as reported by Android's API (https://developer.android.com/reference/android/location/Location#getLon...()).
- `movingSpeed`: The ground mobility/moving speed of the UE as reported by Android's API (https://developer.android.com/reference/android/location/Location#getSpeed()). The unit is meters per second.
- `compassDirection`: The bearing in degrees as reported by Android's API (https://developer.android.com/reference/android/location/Location#getBea...()). Bearing is the horizontal direction of travel of this device, and is not related to the device orientation. It is guaranteed to be in the range `(0.0, 360.0]` if the device has a bearing.
- `nrStatus`: Indicates if the UE was connected to 5G network or not. When `nrStatus=CONNECTED`, the UE was connected to 5G. All other values of `nrStatus` such as `NOT_RESTRICTED` and `NONE` indicate the UE was not connected to 5G. `nrStatus` was obtained by parsing the raw string representation of `ServiceState` object (https://developer.android.com/reference/android/telephony/ServiceState#t...()).
- `lte_rssi`: Get Received Signal Strength Indication (RSSI) in dBm of the primary serving LTE cell. The value range is [-113, -51] inclusively or CellInfo#UNAVAILABLE if unavailable. Reference: TS 27.007 8.5 Signal quality +CSQ.
- `lte_rsrp`: Get reference signal received power (RSRP) in dBm of the primary serving LTE cell.
- `lte_rsrq`: Get reference signal received quality (RSRQ) of the primary serving LTE cell.
- `lte_rssnr`: Get reference signal signal-to-noise ratio (RSSNR) of the primary serving LTE cell.
- `nr_ssRsrp`: Obtained by parsing the raw string representation of `SignalStrength` object (https://developer.android.com/reference/android/telephony/SignalStrength...()). `nr_ssRsrp` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -140 dBm to -44 dBm.
- `nr_ssRsrq`: Obtained by parsing the raw string representation of `SignalStrength` object (https://developer.android.com/reference/android/telephony/SignalStrength...()). `nr_ssRsrq` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -20 dB to -3 dB.
- `nr_ssSinr`: Obtained by parsing the raw string representation of `SignalStrength` object (https://developer.android.com/reference/android/telephony/SignalStrength...()). `nr_ssSinr` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215 Sec 5.1.*, 3GPP TS 38.133 10.1.16.1 Range: -23 dB to 40 dB
- `Throughput`: Indicates the throughput perceived by the UE. iPerf 3.7 was used to measure the per-second TCP downlink at the UE.
- `mobility_mode`: Indicates the grouth truth about the mobility mode when the experiment was conducted. This value can either be walking or driving.
- `trajectory_direction`: Indicates the ground truth about the trajectory direction of the experiment conducted at the Loop area. `CW` indicates clockwise direction, while `ACW` indicates anti-clockwise. Note, the driving experiments were only conducted in `CW` direction as certain parts of the loop were one way only. Walking-based experiments were conducted in both directions.
- `tower_id`: Indicates the (anonymized) tower identifier.

Note: We found that availability (and at times even the values) of `lte_rssi`, `nr_ssRsrp`, `nr_ssRsrq` and `nr_ssSinr` were not reliable. Since these values were sampled every second, at certain times (e.g., boundary cases), we might still find NR-related values when `nrStatus` is not equal to `CONNECTED`. However, in this dataset, we still include all the raw values as reported by the APIs.

## CITING THE DATASET

```
@inproceedings{10.1145/3419394.3423629,
author = {Narayanan, Arvind and Ramadan, Eman and Mehta, Rishabh and Hu, Xinyue and Liu, Qingxu and Fezeu, Rostand A. K. and Dayalan, Udhaya Kumar and Verma, Saurabh and Ji, Peiqi and Li, Tao and Qian, Feng and Zhang, Zhi-Li},
title = {Lumos5G: Mapping and Predicting Commercial MmWave 5G Throughput},
year = {2020},
isbn = {9781450381383},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3419394.3423629},
doi = {10.1145/3419394.3423629},
booktitle = {Proceedings of the ACM Internet Measurement Conference},
pages = {176–193},
numpages = {18},
keywords = {bandwidth estimation, mmWave, machine learning, Lumos5G, throughput prediction, deep learning, prediction, 5G},
location = {Virtual Event, USA},
series = {IMC '20}
}
```

## QUESTIONS?

Please feel free to contact the FiveGophers/Lumos5G team for questions or information about the data (arvind@cs.umn.edu,eman@cs.umn.edu,zhzhang@cs.umn.edu,fengqian@umn.edu,fivegophers@umn.edu)

## LICENSE

Lumos5G 1.0 dataset is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Categories:
23 Views

This is a supplementary data file, providing the data used to evaluate the performance of our 3D fully convolutional neural network. This network removes reverberation noise from ultrasound channel data. This dataset is simulated ultrasound channel data, simulated in Field II Pro, and has artificial reverberation and thermal noise added. This dataset will be linked to our publication, once it is accepted. 

Categories:
26 Views

Holoscopic micro-gesture recognition (HoMG) database was recorded using a holoscopic 3D camera, which have 3 conventional gestures from 40 participants under different settings and conditions. The principle of holoscopic 3D (H3D) imaging mimics fly’s eye technique that captures a true 3D optical model of the scene using a microlens array. For the purpose of H3D micro-gesture recognition. HoMG database has two subsets. The video subset has 960 videos and the image subset has 30635 images, while both have three type of microgestures (classes).

Instructions: 

Holoscopic micro-gesture recognition (HoMG) database consists of 3 hand gestures: Button, Dial and Slider from 40 subjects with various ages and settings, which includes the right and left hand, two of record distance.

For video subset: There are 40 subjects, and each subject has 24 videos due to the different setting and three gestures. For each video, the frame rate is 25 frames per second and length of videos are from few seconds to 20 seconds and not equally. The whole dataset was divided into 3 parts. 20 subjects for the training set, 10 subjects for development set and another 10 subjects for testing set.

For image subset: Video can capture the motion information of the micro-gesture and it is a good way for micro-gesture recognition. From each video recording, the different number of frames were selected as the still micro-gesture images. The image resolution 1920 by 1080. In total, there are 30635 images selected. The whole dataset was split into three partitions: A Training, Development, and Testing partition. There are 15237 images in the training subsets of 20 participants with 8364 in close distance and 6853 in the far distance. There are 6956 images in the development subsets of 10 participants with 3077 in close distance and 3879 in far distance. There are 8442 images in the testing subsets of 10 participants with 3930 in close distance and 4512 in far distance.

Categories:
109 Views

One paramount challenge in multi-ion-sensing arises from ion interference that degrades the accuracy of sensor calibration. Machine learning models are here proposed to optimize such multivariate calibration. However, the acquisition of big experimental data is time and resource consuming in practice, necessitating new paradigms and efficient models for these data-limited frameworks. Therefore, a novel approach is presented in this work, where a multi-ion-sensing emulator is designed to explain the response of an ion-sensing array in a mixed-ion environment.

Categories:
45 Views

Predicting energy consumption is currently a key challenge for the energy industry as a whole.  Predicting the consumption in a certain area is massively complicated due to the sudden changes in the way that energy is being consumed and generated at the current point in time. However, this prediction becomes extremely necessary to minimise costs and to enable adjusting (automatically) the production of energy and better balance the load between different energy sources.

Last Updated On: 
Wed, 12/23/2020 - 12:16
Citation Author(s): 
Isaac Triguero

The ability of detecting human postures is particularly important in several fields like ambient intelligence, surveillance, elderly care, and human-machine interaction. Most of the earlier works in this area are based on computer vision. However, mostly these works are limited in providing real time solution for the detection activities. Therefore, we are currently working toward the Internet of Things (IoT) based solution for the human posture recognition.

Categories:
2306 Views

DataSet used in learning process of the traditional technique's operation, considering different devices and scenarios, the proposed approach can adapt its response to the device in use, identifying the MAC layer protocol, perform the commutation through the protocol in use, and make the device to operate with the best possible configuration.

Categories:
182 Views

Pages