The following data set is modelled after the implementers’ test data in 3GPP TS 33.501 “Security architecture and procedures for 5G System” with the same terminology. The data set corresponds to SUCI (Subscription Concealed Identifier) computation in the 5G UE (User Equipment) for IMSI (International Mobile Subscriber Identity) based SUPI (Subscription Permanent Identifier) and ECIES Profile A.


The following data set is modelled after the implementers’ test data in 3GPP TS 33.501 “Security architecture and procedures for 5G System” with the same terminology. The data set corresponds to SUCI (Subscription Concealed Identifier) computation in the 5G UE (User Equipment) for IMSI (International Mobile Subscriber Identity) based SUPI (Subscription Permanent Identifier) and ECIES Profile A, the IMSI consists of MCC|MNC: '274012'. 

In the 5G system, the globally unique 5G subscription permanent identifier is called SUPI as defined in 3GPP TS 23.501. For privacy reasons, the SUPI from the 5G devices should not be transferred in clear text, and is instead concealed inside the privacy preserving SUCI. Consequently, the SUPI is privacy protected over-the-air of the 5G radio network by using the SUCI. For SUCIs containing IMSI based SUPI, the UE in essence conceals the MSIN (Mobile Subscriber Identification Number) part of the IMSI. On the 5G operator-side, the SIDF (Subscription Identifier De-concealing Function) of the UDM (Unified Data Management) is responsible for de-concealment of the SUCI and resolves the SUPI from the SUCI based on the protection scheme used to generate the SUCI. 

The SUCI protection scheme used in this data set is ECIES Profile A. The size of the scheme-output is a total of 256-bit public key, 64-bit MAC & 40-bit encrypted MSIN. The SUCI scheme-input MSIN is coded as hexadecimal digits using packed BCD coding where the order of digits within an octet is same as the order of MSIN. As the MSINs are odd number of digits, bits 5 to 8 of final octet is coded as ‘1111’.  

# Example Python code to load data into Spark DataFrame

df ="csv").option("inferSchema","true").option("header","true").option("sep",",").load(“5g_suci_using_ecies_profile_a_100k.gz”)


Secure cryptographic protocols are indispensable for modern communication systems. It is realized through an encryption process in cryptography. In quantum cryptography, Quantum Key Distribution (QKD) is a widely popular quantum communication scheme that enables two parties to establish a shared secret key that can be used to encrypt and decrypt messages.


Presented here is a dataset used for our SCADA cybersecurity research. The dataset was built using our SCADA system testbed described in our paper below [*]. The purpose of our testbed was to emulate real-world industrial systems closely. It allowed us to carry out realistic cyber-attacks.



Provided dataset is cleased, pre-processed, and ready to use. The users may modify as they wish, but please cite the dataset as below.

M. A. Teixeira, M. Zolanvari, R. Jain, "WUSTL-IIOT-2018 Dataset for ICS (SCADA) Cybersecurity Research," 2018. [Online]. Available:


We introduce a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs). In contrast to prior efforts, the proposed database contains both genuine voice commands and replayed recordings of such commands, collected in realistic VCSs usage scenarios and using modern voice assistant development kits.


The corpus consists of three sets: the core, evaluation, and complete set. The complete set contains all the data (i.e., complete set = core set + evaluation set) and allows the user to freely split the training/test set. Core/evaluation sets suggest a default training/test split. For each set, all *.wav files are in the /data directory and the meta information is in meta.csv file. The protocol is described in the readme.txt. A PyTorch data loader script is provided as an example of how to use the data. A python resample script is provided for resampling the dataset into the desired sample rate.


Message Queuing Telemetry Transport (MQTT) protocol is one of the most used standards used in Internet of Things (IoT) machine to machine communication. The increase in the number of available IoT devices and used protocols reinforce the need for new and robust Intrusion Detection Systems (IDS). However, building IoT IDS requires the availability of datasets to process, train and evaluate these models. The dataset presented in this paper is the first to simulate an MQTT-based network. The dataset is generated using a simulated MQTT network architecture.


The dataset consists of 5 pcap files, namely, normal.pcap, sparta.pcap, scan_A.pcap, mqtt_bruteforce.pcap and scan_sU.pcap. Each file represents a recording of one scenario; normal operation, Sparta SSH brute-force, aggressive scan, MQTT brute-force and UDP scan respectively. The attack pcap files contain background normal operations. The attacker IP address is “”. Basic packet features are extracted from the pcap files into CSV files with the same pcap file names. The features include flags, length, MQTT message parameters, etc. Later, unidirectional and bidirectional features are extracted.  It is important to note that for the bidirectional flows, some features (pointed as *) have two values—one for forward flow and one for the backward flow. The two features are recorded and distinguished by a prefix “fwd_” for forward and “bwd_” for backward. 



This dataset accompanies the article "Palisade: A Framework for Anomaly Detection in Embedded Systems."  It contains traces, programs, and specifications used in the case studies from the paper.


Case Study 1: Autonomous Vehicle - Comparison between Siddhi and Palisade nfer processor

  • cs1_gear_flip_flop_data.csv - the data used in the Gear Flip-Flop anomaly study and the comparison with Siddhi
  • cs1_comparison.nfer - the nfer specification used in the comparison with Siddhi
  • cs1_comparison.siddhi - the siddhi specification used in the comparison with Siddhi


Case Study 2: ADAS-on-a-treadmill - Comparison between Beep Beep 3 and Palisade rangeCheck and lossDetect processors

  • cs2_platoon_dead_spot_data.csv - the data used in the Platoon Dead-Spot anomaly study and the comparison with Beep Beep 3
  • cs2_platoon_no_anomaly_data.csv - data used for training in the Platoon Dead-Spot anomaly study
  • cs2_platoon_range_model.json - trained model used by the rangeCheck processor
  • - Beep Beep 3 program to check both range and loss
  • - Beep Beep 3 program to print events
  • - Beep Beep 3 program to read from a file and publish events to the RangeCheck program
  • - Custom Beep Beep 3 event class used in the comparison



The Costas condition on a permutation matrix, expressed as row indices as elements of a vector c, can be expressed as A*c=b, where b is a vector of integers in which no element is zero.  A particular formulation of the matrix A allows a singular value decomposition in which the eigenvalues are squared integers and the eigenvalues may be scaled to vectors with all integer elements.  This is a database of the Costas constraint matrices A, the scaled eigenvectors, and the squared eigenvalues for orders 3 through 100.


Please refer to the file CC_SVD_Database_Readme.pdf for instructions on the format of the database, and its use.  The database contains one file for each order.  The files are CSV files in which each line ends with a comma, then a plain text remark that explains that line.


The supplementary files of our submitted TIFS paper: "CALPA-NET: Channel-pruning-assisted Deep Residual Network for Steganalysis of Digital Images".


This Dataset contains "Pristine" and "Distorted" videos recorded in different places. The 

distortions with which the videos were recorded are: "Focus", "Exposure" and "Focus + Exposure". 

Those three with low (1), medium (2) and high (3) levels, forming a total of 10 conditions 

(including Pristine videos). In addition, distorted videos were exported in three different 

qualities according to the H.264 compression format used in the DIGIFORT software, which were: 

High Quality (HQ, H.264 at 100%), Medium Quality (MQ, H.264 at 75%) and Low Quality 



0. This Dataset is intended to evaluate "Visual Quality Assessment" (VQA) and "Visual Object 

Tracking" (VOT) algorithms. It has 4476 videos with different distortions and their Bounding Box 

annotations ([x(x coordinate) y(y coordinate) w(width) h(height)]) for each frame. It also contains 

a MATLAB script which allows to generate the video sequences for VOT algorithms evaluation.


1. Move the "generateSequences.m" file to the "surveillanceVideosDataset" Folder.


2. Open the script and modify the next parameters according to your need:




%Sequence settings and images nomenclature   %

imagesType = '.jpg';                                              %

imgFolder = 'img';                                                 %  

gtName = 'groundtruth.txt';                                   %

imgNomenclature = ['%04d' imagesType];           %




The last configuration will create a folder like this for each video:


0001SequenceExample (Folder)

- - img (Folder)

- - - - 0001.jpg (Image)

- - - - 0002.jpg (Image)

- - - - ....

- - - - ....

- - - - ....

- - - - 0451.jpg (Image)

- - groundtruth.txt (txt file: Bounding Box Annotations)


3. Press "Run" and wait until the sequences are built. The process can take a long time due to the 

number of videos. You will need 33 GB for the videos, 30 MB for the Bounding Box annotations and 230 

GB for the sequences (.jpg format).






Truth discovery techniques, which can obtain accurate aggregation results based on the weighted sensory data of users, are widely adopted in industrial sensing systems. However, there are some privacy matters that cannot be ignored in truth discovery process. While most of the existing privacy preserving truth discovery methods focus on the privacy of sensory data, they may neglect to protect the privacy of another equally important information, the tagged location information.