This dataset supports researchers in the validation process of solutions such as Intrusion Detection Systems (IDS) based on artificial intelligence and machine learning techniques for the detection and categorization of threats in Cyber Physical Systems (CPS). To that aim, data have been acquired from a Secure Water Treatment (SWaT) hardware-in-the-loop testbed which emulates water passage between nine tanks via solenoid-valves, pumps, pressure and flow sensors. The testbed is composed by a real partition which is virtually connected to a simulated one.

Instructions: 

This dataset has related to the paper "A hardware-in-the-loop Secure Water Treatment dataset for cyber-physical security testing".
We provide four different acquisitions:
1) A normal acquisition without attacks ("normal.csv" for network traffic and "dataset_norm.csv" for physical measures)
2) Three acquisitions where different types of attacks and physical faults are reproduced ("attack_1.csv", "attack_2.csv" and "attack_3.csv" for network traffic and "dataset_att_1.csv", "dataset_att_2.csv" and "dataset_att_3.csv" for physical measures)
In addition to .csv files we provide four .pcap files ("attack_1.pcap", "attack_2.pcap", "attack_3.pcap" and "normal.pcap") which refer to network acquisitions for the four previous scenarios.
A README.xlsx file summarizes the key features of the entire dataset.

Categories:
35 Views

There is an unmet need for quick, physically small, and cost-effective office-based techniques that can measure bone properties without the use of ionizing radiation. The present study reports application of a neural network classifier to the processing of previously collected data on very low power radiofrequency propagation through the wrist with the goal to detect osteoporotic/osteopenic conditions. Our approach categorizes the data obtained for two dichotomic groups. Group 1 included 27 osteoporotic/osteopenic subjects with low BMD (DXA T score below - 1) measured within one year.

Categories:
70 Views

This dataset was prepared to estimate the winding temperature of a BLDC motor for a variable load and speed profile. It contains two files. The first one is the measurement results for the motor without cooling, while the second one is the measurement results after installing an additional cooling fan on the shaft. The data included in the files are time stamp, winding temperature, casing temperature, speed, current, power loss, mean and standard deviation of the measured quantities for 14400 data records.

Categories:
128 Views

BIMCV-COVID19- dataset is a large dataset with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of no COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction (PCR), immunoglobulin G (IgG) and immunoglobulin M (IgM) diagnostic antibody tests and radiographic reports from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV).

Instructions: 

Once all the compressed files have been downloaded, use 00_extract_data.sh for their correct decompression. For more information, you could see the links on this page.

Categories:
83 Views

BIMCV-COVID19+ dataset is a large dataset with chest X-ray images CXR (CR, DX) and computed tomography (CT) imaging of COVID-19 patients along with their radiographic findings, pathologies, polymerase chain reaction (PCR), immunoglobulin G (IgG) and immunoglobulin M (IgM) diagnostic antibody tests and radiographic reports from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV).

Instructions: 

Once all the compressed files have been downloaded, use 00_extract_data.sh for their correct decompression. For more information, you could see the links on this page

Categories:
184 Views

Smart speakers and voice-based virtual assistants are core components for the success of the IoT paradigm. Unfortunately, they are vulnerable to various privacy threats exploiting machine learning to analyze the generated encrypted traffic. To cope with that, deep adversarial learning approaches can be used to build black-box countermeasures altering the network traffic (e.g., via packet padding) and its statistical information.

Instructions: 

This dataset contains several pcap files generated by the Google Home smart speaker placed under different conditions.

  • Mic_on_off_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively.
  • Mic_on_off_gquic_8h contains two pcap files generated by keeping the microphone on (with silence) and off for 8 hours respectively, excluding all network traffic not belonging to the google: gquic protocol.
  • Mic_on_off_noise_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days.
  • Mic_on_off_noise_gquic_3d contains three pcap files generated by holding on (with silence), off, and on (with noise) the microphone respectively for 3 days. excluding all network traffic not belonging to the google protocol: gquic.
  • media_pcap_anonymized contains several pcap files after the execution of queries such as "Whats' the latest news?" or "Play some music" (On each file has been stored network traffic collected after the execution of one query).
  • travel_pcap_anonymized contains several pcap files after the execution of queries such as "How is the weather today?" (On each file has been stored network traffic collected after the execution of one query).
  • utilities_pcap_anonymized contains several pcap files after the execution of queries such as "What's on my agenda today?" or "What time is it?" (On each file has been stored network traffic collected after the execution of one query).
Categories:
149 Views

This dataset is part of my Master's research on malware detection and classification using the XGBoost library on Nvidia GPU. The dataset is a collection of 1.55 million of 1000 API import features extract from jsonl format of the EMBER dataset 2017 v2 and 2018. All data is pre-processing, duplicated records are removed. The dataset contains 800,000 malware and 750,000 "goodware" samples.

Instructions: 

* FEATURES *

Column name:  sha256

Description: SHA256 hash of the example

Type: string

 

Column name:  appeared

Description: appeared date of the sample

Type: date (yyyy-mm format)

 

Column name:  label

Description: specify malware or "goodware" of the sample

Type: 0 ("goodware") or 1 (malware)

 

Column name: GetProcAddress

Description: Most imported function (1st)

Type: 0 (Not imported) or 1 (Imported)

 

...

Column name: LookupAccountSidW

Description: Least imported function (1000th)

Type: 0 (Not imported) or 1 (Imported)

 

The full dataset features header can be downloaded at https://github.com/tvquynh/api_import_dataset/blob/main/full_dataset_fea...

All processing code will be uploaded to https://github.com/tvquynh/api_import_dataset/

Categories:
12553 Views

As an alternative to classical cryptography, Physical Layer Security (PhySec) provides primitives to achieve fundamental security goals like confidentiality, authentication or key derivation. Through its origins in the field of information theory, these primitives are rigorously analysed and their information theoretic security is proven. Nevertheless, the practical realizations of the different approaches do take certain assumptions about the physical world as granted.

Instructions: 

The data is provided as zipped NumPy arrays with custom headers. To load an file the NumPy package is required.

The respective loadz primitive allows for a straight forward loading of the datasets.

To load a file “file.npz” the following code is sufficient:

import numpy as np

measurement = np.load(’file.npz ’, allow pickle =False)

header , data = measurement [’header ’], measurement [’data ’]

The dataset comes with a supplementary script example_script.py illustrating the basic usage of the dataset.

Categories:
149 Views

The emerging 5G services offer numerous new opportunities for networked applications. In this study, we seek to answer two key questions: i) is the throughput of mmWave 5G predictable, and ii) can we build "good" machine learning models for 5G throughput prediction? To this end, we conduct a measurement study of commercial mmWave 5G services in a major U.S. city, focusing on the throughput as perceived by applications running on user equipment (UE).

Instructions: 

DATASET WEBSITE: https://lumos5g.umn.edu/

## OVERVIEW

Lumos5G 1.0 is a dataset that represents the `Loop` area of the IMC'20 paper - "Lumos5G: Mapping and Predicting Commercial mmWave 5G Throughput". The Loop area is a 1300 meter loop near U.S. Bank Stadium in Minneapolis downtown area that covers roads, railroad crossings, restaurants, coffee shops, and recreational outdoor parks.

This dataset is being made available to the research community.

## DATASET COLUMNS AND DESCRIPTION

The description of the columns in the dataset CSV, from left to right, are:

- `run_num`: Indicates the run number. For each trajectory and mobility mode, we conduct several runs of experiments.
- `seq_num`: This is the sequence number. For each run, the sequence number acts like an index or a per-second timeline.
- `abstractSignalStr`: Indicates the abstract signal strength as reported by Android API (https://developer.android.com/reference/android/telephony/SignalStrength...()). No matter whether the UE was connected to 5G service or not, this column always reported a value associated with the LTE/4G radio. Note, if one is interested to understand the signal strength values related to 5G-NR, we refer them to other columns such as `nr_ssRsrp`, `nr_ssRsrq`, and `nr_ssSinr`.
- `latitude`: The latitude in degrees as reported by Android's API (https://developer.android.com/reference/android/location/Location#getLat...()).
- `longitude`: The longitude in degrees as reported by Android's API (https://developer.android.com/reference/android/location/Location#getLon...()).
- `movingSpeed`: The ground mobility/moving speed of the UE as reported by Android's API (https://developer.android.com/reference/android/location/Location#getSpeed()). The unit is meters per second.
- `compassDirection`: The bearing in degrees as reported by Android's API (https://developer.android.com/reference/android/location/Location#getBea...()). Bearing is the horizontal direction of travel of this device, and is not related to the device orientation. It is guaranteed to be in the range `(0.0, 360.0]` if the device has a bearing.
- `nrStatus`: Indicates if the UE was connected to 5G network or not. When `nrStatus=CONNECTED`, the UE was connected to 5G. All other values of `nrStatus` such as `NOT_RESTRICTED` and `NONE` indicate the UE was not connected to 5G. `nrStatus` was obtained by parsing the raw string representation of `ServiceState` object (https://developer.android.com/reference/android/telephony/ServiceState#t...()).
- `lte_rssi`: Get Received Signal Strength Indication (RSSI) in dBm of the primary serving LTE cell. The value range is [-113, -51] inclusively or CellInfo#UNAVAILABLE if unavailable. Reference: TS 27.007 8.5 Signal quality +CSQ.
- `lte_rsrp`: Get reference signal received power (RSRP) in dBm of the primary serving LTE cell.
- `lte_rsrq`: Get reference signal received quality (RSRQ) of the primary serving LTE cell.
- `lte_rssnr`: Get reference signal signal-to-noise ratio (RSSNR) of the primary serving LTE cell.
- `nr_ssRsrp`: Obtained by parsing the raw string representation of `SignalStrength` object (https://developer.android.com/reference/android/telephony/SignalStrength...()). `nr_ssRsrp` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -140 dBm to -44 dBm.
- `nr_ssRsrq`: Obtained by parsing the raw string representation of `SignalStrength` object (https://developer.android.com/reference/android/telephony/SignalStrength...()). `nr_ssRsrq` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215. Range: -20 dB to -3 dB.
- `nr_ssSinr`: Obtained by parsing the raw string representation of `SignalStrength` object (https://developer.android.com/reference/android/telephony/SignalStrength...()). `nr_ssSinr` was a field in this object's `CellSignalStrengthNr` section. In general, this value was only available when the UE was connected to 5G (i.e., when `nrStatus=CONNECTED`). Reference: 3GPP TS 38.215 Sec 5.1.*, 3GPP TS 38.133 10.1.16.1 Range: -23 dB to 40 dB
- `Throughput`: Indicates the throughput perceived by the UE. iPerf 3.7 was used to measure the per-second TCP downlink at the UE.
- `mobility_mode`: Indicates the grouth truth about the mobility mode when the experiment was conducted. This value can either be walking or driving.
- `trajectory_direction`: Indicates the ground truth about the trajectory direction of the experiment conducted at the Loop area. `CW` indicates clockwise direction, while `ACW` indicates anti-clockwise. Note, the driving experiments were only conducted in `CW` direction as certain parts of the loop were one way only. Walking-based experiments were conducted in both directions.
- `tower_id`: Indicates the (anonymized) tower identifier.

Note: We found that availability (and at times even the values) of `lte_rssi`, `nr_ssRsrp`, `nr_ssRsrq` and `nr_ssSinr` were not reliable. Since these values were sampled every second, at certain times (e.g., boundary cases), we might still find NR-related values when `nrStatus` is not equal to `CONNECTED`. However, in this dataset, we still include all the raw values as reported by the APIs.

## CITING THE DATASET

```
@inproceedings{10.1145/3419394.3423629,
author = {Narayanan, Arvind and Ramadan, Eman and Mehta, Rishabh and Hu, Xinyue and Liu, Qingxu and Fezeu, Rostand A. K. and Dayalan, Udhaya Kumar and Verma, Saurabh and Ji, Peiqi and Li, Tao and Qian, Feng and Zhang, Zhi-Li},
title = {Lumos5G: Mapping and Predicting Commercial MmWave 5G Throughput},
year = {2020},
isbn = {9781450381383},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3419394.3423629},
doi = {10.1145/3419394.3423629},
booktitle = {Proceedings of the ACM Internet Measurement Conference},
pages = {176–193},
numpages = {18},
keywords = {bandwidth estimation, mmWave, machine learning, Lumos5G, throughput prediction, deep learning, prediction, 5G},
location = {Virtual Event, USA},
series = {IMC '20}
}
```

## QUESTIONS?

Please feel free to contact the FiveGophers/Lumos5G team for questions or information about the data (arvind@cs.umn.edu,eman@cs.umn.edu,zhzhang@cs.umn.edu,fengqian@umn.edu,fivegophers@umn.edu)

## LICENSE

Lumos5G 1.0 dataset is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Categories:
93 Views

Related to above sarch keywords following tweets were extracted b/w 15 nov 2020 to 10 jan 2021

29499  English TWEETS extracted,

4628 Japanese tweets extracted

678 Hindi tweets extracted 

 

Categories:
94 Views

Pages