Real-World Multimodal Foodlog Database (RWMF) database is built for evaluating the multimodal retrieval algorithm in real-life dietary environment, and it has 7500 multimodal pairs in total， where each image can be related to multiple texts and each text can be related to multiple images. Details of this database can be found in this paper: Pengfei Zhou, Cong Bai, Kaining Ying, Jie Xia, Lixin Huang, RWMF: Real-World Multimodal Foodlog Database, ICPR 2020
Since this is a multimodal database, the images in RWMF is related to texts by share the same tag, which is saved in `Foodhealth/im_label`
* `Foodlog`: the real-world food images and the associative instant bio-data
** `Image`: the folder that contains all the real-world foodlog images.
** `biodata.csv`: the csv file that contains all the associative instant bio-data, these data are associated to food images by the file names of images.
** `biodata.txt`: the txt that indicate the attributes of each column in `biodata.csv`.
** `data_category.csv`: the health category tags that help the model test the performance of cross-modal retrieval.
** `data_category.txt`: the txt that indicate the attributes of each column in `data_category.csv`.
* `Foodhealth`: the food description texts and the associative food nutrition composition data
** `description.csv`: the csv file that contains all the food description texts refered to each tag.
** `description.txt`: the txt file that indicate the attributes of each column in `description.csv`.
** `composition.csv`: the csv file that contains all the food nutrition composition data refered to each tag.
** `composition.txt`: the txt file that indicate the attributes of each column in `composition.csv`.
** `im_label.csv`: the csv file that contains all the tags related to each image.
** `im_label.txt`: the txt file that indicate the attributes of each column in `im_label.csv`.
The following data set is modelled after the implementers’ test data in 3GPP TS 33.501 “Security architecture and procedures for 5G System” with the same terminology. The data set corresponds to SUCI (Subscription Concealed Identifier) computation in the 5G UE (User Equipment) for IMSI (International Mobile Subscriber Identity) based SUPI (Subscription Permanent Identifier) and ECIES Profile A.
The following data set is modelled after the implementers’ test data in 3GPP TS 33.501 “Security architecture and procedures for 5G System” with the same terminology. The data set corresponds to SUCI (Subscription Concealed Identifier) computation in the 5G UE (User Equipment) for IMSI (International Mobile Subscriber Identity) based SUPI (Subscription Permanent Identifier) and ECIES Profile A, the IMSI consists of MCC|MNC: '274012'.
In the 5G system, the globally unique 5G subscription permanent identifier is called SUPI as defined in 3GPP TS 23.501. For privacy reasons, the SUPI from the 5G devices should not be transferred in clear text, and is instead concealed inside the privacy preserving SUCI. Consequently, the SUPI is privacy protected over-the-air of the 5G radio network by using the SUCI. For SUCIs containing IMSI based SUPI, the UE in essence conceals the MSIN (Mobile Subscriber Identification Number) part of the IMSI. On the 5G operator-side, the SIDF (Subscription Identifier De-concealing Function) of the UDM (Unified Data Management) is responsible for de-concealment of the SUCI and resolves the SUPI from the SUCI based on the protection scheme used to generate the SUCI.
The SUCI protection scheme used in this data set is ECIES Profile A. The size of the scheme-output is a total of 256-bit public key, 64-bit MAC & 40-bit encrypted MSIN. The SUCI scheme-input MSIN is coded as hexadecimal digits using packed BCD coding where the order of digits within an octet is same as the order of MSIN. As the MSINs are odd number of digits, bits 5 to 8 of final octet is coded as ‘1111’.
# Example Python code to load data into Spark DataFrame
df = spark.read.format("csv").option("inferSchema","true").option("header","true").option("sep",",").load(“5g_suci_using_ecies_profile_a_100k.gz”)
Vibration measurement on SAG mill drive motor for Energy harvesting or predictive maintenance
Presented here is a dataset used for our SCADA cybersecurity research. The dataset was built using our SCADA system testbed described in our paper below [*]. The purpose of our testbed was to emulate real-world industrial systems closely. It allowed us to carry out realistic cyber-attacks.
Provided dataset is cleased, pre-processed, and ready to use. The users may modify as they wish, but please cite the dataset as below.
M. A. Teixeira, M. Zolanvari, R. Jain, "WUSTL-IIOT-2018 Dataset for ICS (SCADA) Cybersecurity Research," 2018. [Online]. Available: https://www.cse.wustl.edu/~jain/iiot/index.html.
Message Queuing Telemetry Transport (MQTT) protocol is one of the most used standards used in Internet of Things (IoT) machine to machine communication. The increase in the number of available IoT devices and used protocols reinforce the need for new and robust Intrusion Detection Systems (IDS). However, building IoT IDS requires the availability of datasets to process, train and evaluate these models. The dataset presented in this paper is the first to simulate an MQTT-based network. The dataset is generated using a simulated MQTT network architecture.
The dataset consists of 5 pcap files, namely, normal.pcap, sparta.pcap, scan_A.pcap, mqtt_bruteforce.pcap and scan_sU.pcap. Each file represents a recording of one scenario; normal operation, Sparta SSH brute-force, aggressive scan, MQTT brute-force and UDP scan respectively. The attack pcap files contain background normal operations. The attacker IP address is “192.168.2.5”. Basic packet features are extracted from the pcap files into CSV files with the same pcap file names. The features include flags, length, MQTT message parameters, etc. Later, unidirectional and bidirectional features are extracted. It is important to note that for the bidirectional flows, some features (pointed as *) have two values—one for forward flow and one for the backward flow. The two features are recorded and distinguished by a prefix “fwd_” for forward and “bwd_” for backward.
The demo data set consists the propagation path distances of AT & T North America Netowork Topology. The geographical node positions (latitude and longitude) along with the adjacency matrix has been found out from International Topology Zoo and the data set has been formed using the available data. This set has been used in Joint localization prolem of Controller and Hypervisor instances in vSDN enebled 5G Network.
Underground UE statistics measurements captured on u-blox SARAN211 NB-IoT device, frequency band 20. Signal waveform captured by means of Rohde&Schwartz TSMW device. The samples were taken along ca. 1600m of level -2 underground tunnel system under Lyngby Campus of Technical University of Denmark.
Dataset used for "A Machine Learning Approach for Wi-Fi RTT Ranging" paper (ION ITM 2019). The dataset includes almost 30,000 Wi-Fi RTT (FTM) raw channel measurements from real-life client and access points, from an office environment. This data can be used for Time of Arrival (ToA), ranging, positioning, navigation and other types of research in Wi-Fi indoor location. The zip file includes a README file, a CSV file with the dataset and several Matlab functions to help the user plot the data and demonstrate how to estimate the range.
Copyright (C) 2018 Intel Corporation
Welcome to the Intel WiFi RTT (FTM) 40MHz dataset.
The paper and the dataset can be downloaded from:
To cite the dataset and code, or for further details, please use:
Nir Dvorecki, Ofer Bar-Shalom, Leor Banin, and Yuval Amizur, "A Machine Learning Approach for Wi-Fi RTT Ranging," ION Technical Meeting ITM/PTTI 2019
For questions/comments contact:
The zip file contains the following files:
1) This README.txt file.
2) LICENSE.txt file.
3) RTT_data.csv - the dataset of FTM transactions
4) Helper Matlab files:
O mainFtmDatasetExample.m - main function to run in order to execute the Matlab example.
O PlotFTMchannel.m - plots the channels of a single FTM transaction.
O PlotFTMpositions.m - plots user and Access Point (AP) positions.
O ReadFtmMeasFile.m - reads the RTT_data.csv file to numeric Matlab matrix.
O SimpleFTMrangeEstimation.m - execute a simple range estimation on the entire dataset.
O Office1_40MHz_VenueFile.mat - contains a map of the office from which the dataset was gathered.
Running the Matlab example:
In order to run the Matlab simulation, extract the contents of the zip file and call the mainFtmDatasetExample() function from Matlab.
Contents of the dataset:
The RTT_data.csv file contains a header row, followed by 29581 rows of FTM transactions.
The first column of the header row includes an extra "%" in the begining, so that the entire csv file can be easily loaded to Matlab using the command: load('RTT_data.csv')
Indexing the csv columns from 1 (leftmost column) to 467 (rightmost column):
O column 1 - Timestamp of each measurement (sec)
O columns 2 to 4 - Ground truth (GT) position of the client at the time the measurement was taken (meters, in local frame)
O column 5 - Range, as estimated by the devices in real time (meters)
O columns 6 to 8 - Access Point (AP) position (meters, in local frame)
O column 9 - AP index/number, according the convention of the ION ITM 2019 paper
O column 10 - Ground truth range between the AP and client (meters)
O column 11 - Time of Departure (ToD) factor in meters, such that: TrueRange = (ToA_client + ToA_AP)*3e8/2 + ToD_factor (eq. 7 in the ION ITM paper, with "ToA" being tau_0 and the "ToD_factor" lumps up both nu initiator and nu responder)
O columns 12 to 467 - Complex channel estimates. Each channel contains 114 complex numbers denoting the frequency response of the channel at each WiFi tone:
O columns 12 to 125 - Complex channel estimates for first antenna from the client device
O columns 126 to 239 - Complex channel estimates for second antenna from the client device
O columns 240 to 353 - Complex channel estimates for first antenna from the AP device
O columns 354 to 467 - Complex channel estimates for second antenna from the AP device
The tone frequencies are given by: 312.5E3*[-58:-2, 2:58] Hz (e.g. column 12 of the csv contains the channel response at frequency fc-18.125MHz, where fc is the carrier wave frequency).
Note that the 3 tones around the baseband DC (i.e. around the frequency of the carrier wave), as well as the guard tones, are not included.