Datasets consiting of features extracted from API calls, network activity and Cuckoo signatures of 19994 public files of different types.


The Costas condition on a permutation matrix, expressed as row indices as elements of a vector c, can be expressed as A*c=b, where b is a vector of integers in which no element is zero.  A particular formulation of the matrix A allows a singular value decomposition in which the eigenvalues are squared integers and the eigenvalues may be scaled to vectors with all integer elements.  This is a database of the Costas constraint matrices A, the scaled eigenvectors, and the squared eigenvalues for orders 3 through 100.


Please refer to the file CC_SVD_Database_Readme.pdf for instructions on the format of the database, and its use.  The database contains one file for each order.  The files are CSV files in which each line ends with a comma, then a plain text remark that explains that line.


The supplementary files of our submitted TIFS paper: "CALPA-NET: Channel-pruning-assisted Deep Residual Network for Steganalysis of Digital Images".


This Dataset contains "Pristine" and "Distorted" videos recorded in different places. The 

distortions with which the videos were recorded are: "Focus", "Exposure" and "Focus + Exposure". 

Those three with low (1), medium (2) and high (3) levels, forming a total of 10 conditions 

(including Pristine videos). In addition, distorted videos were exported in three different 

qualities according to the H.264 compression format used in the DIGIFORT software, which were: 

High Quality (HQ, H.264 at 100%), Medium Quality (MQ, H.264 at 75%) and Low Quality 



0. This Dataset is intended to evaluate "Visual Quality Assessment" (VQA) and "Visual Object 

Tracking" (VOT) algorithms. It has 4476 videos with different distortions and their Bounding Box 

annotations ([x(x coordinate) y(y coordinate) w(width) h(height)]) for each frame. It also contains 

a MATLAB script which allows to generate the video sequences for VOT algorithms evaluation.


1. Move the "generateSequences.m" file to the "surveillanceVideosDataset" Folder.


2. Open the script and modify the next parameters according to your need:




%Sequence settings and images nomenclature   %

imagesType = '.jpg';                                              %

imgFolder = 'img';                                                 %  

gtName = 'groundtruth.txt';                                   %

imgNomenclature = ['%04d' imagesType];           %




The last configuration will create a folder like this for each video:


0001SequenceExample (Folder)

- - img (Folder)

- - - - 0001.jpg (Image)

- - - - 0002.jpg (Image)

- - - - ....

- - - - ....

- - - - ....

- - - - 0451.jpg (Image)

- - groundtruth.txt (txt file: Bounding Box Annotations)


3. Press "Run" and wait until the sequences are built. The process can take a long time due to the 

number of videos. You will need 33 GB for the videos, 30 MB for the Bounding Box annotations and 230 

GB for the sequences (.jpg format).






Truth discovery techniques, which can obtain accurate aggregation results based on the weighted sensory data of users, are widely adopted in industrial sensing systems. However, there are some privacy matters that cannot be ignored in truth discovery process. While most of the existing privacy preserving truth discovery methods focus on the privacy of sensory data, they may neglect to protect the privacy of another equally important information, the tagged location information.



Intending to cover the existing gap regarding behavioral datasets modelling interactions of users with individual a multiple devices in Smart Office to later authenticate them continuously, we publish the following collection of datasets, which has been generated after having five users interacting for 60 days with their personal computer and mobile devices. Below you can find a brief description of each dataset.



While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Spanish general election.


Data have been exported in three formats to provide the maximum flexibility:

  • MongoDB Dump BSONs
    • To import these data, please refer to the official MongoDB documentation.
  • JSON Exports
    • Both the users and the tweets collections have been exported as canonical JSON files. 
  • CSV Exports (only tweets)
    • The tweet collection has been exported as plain CSV file with comma separators.

Cyber-Physical Production Systems (CPPS) are the key enabling for industrial businesses and economic growth. The introduction of the Internet of Things (IoT) in industrial processes represents a new Internet revolution, mostly known as 4th Industrial Revolution, towards the Smart Manufacturing concept. Despite the huge interest from the industry side to innovate their production systems, in order to increase revenues at lower costs, the IoT concept is still immature and fuzzy, which increases security related risks in industrial systems.


The generation of the dataset containing OPC UA traffic was possible due to the setup and execution of a laboratory CPPS testbed. This CPPS uses OPC UA standard for horizontal and vertical communications.Regarding the CPPS testbed setup, it consists on seven nodes in the network.Each network node consist on a Raspberry Pi device, running the Python FreeOpcUa implementation. In this configuration, there are two production units, each one containing three devices, and one node representing a Manufacturing Execution System (MES). Each device implements both OPC UA server and client, where the server publish to a OPC UA variable updates regarding sensor readings and the client subscribes all OPC UA variables from all other devices in the same production unit. On the other side, the MES only implements the OPC UA client, which subscribes all OPC UA variables from all devices in both production units. Also, connected to this network, is an attack node as it is assumed that the attacker already gained access to the CPPS network.After setting up the CPPS testbed, a python implementation that implements Tshark was used to capture OPC UA packets and export this traffic to a csv file format dataset. This traffic includes both normal and anomalous behaviour. Anomalous behaviour is achieved with the malicious node, which injects attacks into the CPPS network, targeting one or more device nodes and the MES. The attacks selected for the malicious activities are:

    • Denial of Service(DoS);
    • Eavesdropping or Man-in-the-middle (MITM) attacks;
    • Impersonation or Spoofing attacks.


To perform the attacks mentioned, a python script is used, which implements the Scapy module for packet sniffing, injection and modification. Regarding the dataset generation, another python script, that implements Tshark (in this case Pyshark) was used to capture only OPC UA packets and export this traffic to a csv file format dataset. Actually, the OPC UA packets are converted to bidirectional communication flows, which are characterized by the following 32 features:

    • src_ip: Source IP address;
    • src_port: Source port;
    • dst_ip: Destination IP address;
    • dst_port: Destination port;
    • flags: TCP flag status;
    • pktTotalCount: Total packet count;
    • octetTotalCount: Total packet size;
    • avg_ps: Average packet size;
    • proto: Protocol;
    • service: OPC UA service call type;
    • service_errors: Number of service errors in OPC UA request responses;
    • status_errors: Number of status errors in OPC UA request responses;
    • msg_size: OPC UA message transport size;
    • min_msg_size: minimum OPC UA message size;
    • flowStart: Timestamp of flow start;
    • flowEnd: Timestamp of flow end;
    • flowDuration: Flow duration in seconds;
    • avg_flowDuration: Average flow duration in seconds;
    • flowInterval: Time interval between flows in seconds;
    • count: Number of connections to the same destination host as the current connection in the past two seconds;
    • srv_count: Number of connections to the same port number as the current connection in the past two seconds;
    • same_srv_rate: The percentage of connections that were to the same port number, among the connections aggregated in Count;
    • dst_host_same_src_port_rate: The percentage of connections that were to the same source port, among the connections having the same port number;
    • f_pktTotalCount: Total forward packets count;
    • f_octetTotalCount: Total forward packets size;
    • f_flowStart: Timestamp of first forward packet start;
    • f_rate: Rate at which forward packets are transmitted;
    • b_pktTotalCount: Total backwards packets count;
    • b_octetTotalCount: Total backwards packets size;
    • b_flowStart: Timestamp of first backwards packet start;
    • label: Binary label classification;
    • multi_label: Multi classification labeling.


The generated dataset has 33.567 normal instances, 74.013 DoS attack instances, 50 impersonation attack instances, and 7 MITM attack instances. This gives a total of 107.634 instances. Also, all attacks were grouped into one class (anomaly - 1) and the rest of the instances belong to the normal class (0).

For more information, please contact the author: Rui Pinto (


A simple dataset that gives the processing cost (in cycles) for verifying multiple messages signed with ECDSA and implicitly certified public keys. It considers two implicit certification models: ECQV and SIMPL. 


This dataset is used in article "Schnorr-based implicit certification: improving the security and efficiency of vehicular communications", submitted to IEEE Transactions on Computers. Namely, it is used as basis for building that article's Figure 2.


This dataset was created for the following paper:


PX4 Autopilot (v1.10.1 stable) ( is used for all experiments, running on Pixhawk 4 flight controller for HITL. QGroundControl (v4.0.9) is used for GCS (

Telemetry data is contained in TLOG files (

Full flight data is contained in ULOG files (

It is useful to use ulog2csv to extract more information in CSV format:

GPS spoofing attacks are carried out for 30 seconds. The attacks are done by stopping normal GPS communications, then injecting false readings via Gazebo. This is done by a modification of sitl_gazebo in gazebo_gps_plugin.cpp (