This dataset accompanies the article "Palisade: A Framework for Anomaly Detection in Embedded Systems." It contains traces, programs, and specifications used in the case studies from the paper.
Case Study 1: Autonomous Vehicle - Comparison between Siddhi and Palisade nfer processor
- cs1_gear_flip_flop_data.csv - the data used in the Gear Flip-Flop anomaly study and the comparison with Siddhi
- cs1_comparison.nfer - the nfer specification used in the comparison with Siddhi
- cs1_comparison.siddhi - the siddhi specification used in the comparison with Siddhi
Case Study 2: ADAS-on-a-treadmill - Comparison between Beep Beep 3 and Palisade rangeCheck and lossDetect processors
- cs2_platoon_dead_spot_data.csv - the data used in the Platoon Dead-Spot anomaly study and the comparison with Beep Beep 3
- cs2_platoon_no_anomaly_data.csv - data used for training in the Platoon Dead-Spot anomaly study
- cs2_platoon_range_model.json - trained model used by the rangeCheck processor
- RangeCheck.java - Beep Beep 3 program to check both range and loss
- BenchSink.java - Beep Beep 3 program to print events
- BenchPublisher.java - Beep Beep 3 program to read from a file and publish events to the RangeCheck program
- BenchEvent.java - Custom Beep Beep 3 event class used in the comparison
The Costas condition on a permutation matrix, expressed as row indices as elements of a vector c, can be expressed as A*c=b, where b is a vector of integers in which no element is zero. A particular formulation of the matrix A allows a singular value decomposition in which the eigenvalues are squared integers and the eigenvalues may be scaled to vectors with all integer elements. This is a database of the Costas constraint matrices A, the scaled eigenvectors, and the squared eigenvalues for orders 3 through 100.
Please refer to the file CC_SVD_Database_Readme.pdf for instructions on the format of the database, and its use. The database contains one file for each order. The files are CSV files in which each line ends with a comma, then a plain text remark that explains that line.
The supplementary files of our submitted TIFS paper: "CALPA-NET: Channel-pruning-assisted Deep Residual Network for Steganalysis of Digital Images".
This Dataset contains "Pristine" and "Distorted" videos recorded in different places. The
distortions with which the videos were recorded are: "Focus", "Exposure" and "Focus + Exposure".
Those three with low (1), medium (2) and high (3) levels, forming a total of 10 conditions
(including Pristine videos). In addition, distorted videos were exported in three different
qualities according to the H.264 compression format used in the DIGIFORT software, which were:
High Quality (HQ, H.264 at 100%), Medium Quality (MQ, H.264 at 75%) and Low Quality
0. This Dataset is intended to evaluate "Visual Quality Assessment" (VQA) and "Visual Object
Tracking" (VOT) algorithms. It has 4476 videos with different distortions and their Bounding Box
annotations ([x(x coordinate) y(y coordinate) w(width) h(height)]) for each frame. It also contains
a MATLAB script which allows to generate the video sequences for VOT algorithms evaluation.
1. Move the "generateSequences.m" file to the "surveillanceVideosDataset" Folder.
2. Open the script and modify the next parameters according to your need:
%Sequence settings and images nomenclature %
imagesType = '.jpg'; %
imgFolder = 'img'; %
gtName = 'groundtruth.txt'; %
imgNomenclature = ['%04d' imagesType]; %
The last configuration will create a folder like this for each video:
- - img (Folder)
- - - - 0001.jpg (Image)
- - - - 0002.jpg (Image)
- - - - ....
- - - - ....
- - - - ....
- - - - 0451.jpg (Image)
- - groundtruth.txt (txt file: Bounding Box Annotations)
3. Press "Run" and wait until the sequences are built. The process can take a long time due to the
number of videos. You will need 33 GB for the videos, 30 MB for the Bounding Box annotations and 230
GB for the sequences (.jpg format).
Truth discovery techniques, which can obtain accurate aggregation results based on the weighted sensory data of users, are widely adopted in industrial sensing systems. However, there are some privacy matters that cannot be ignored in truth discovery process. While most of the existing privacy preserving truth discovery methods focus on the privacy of sensory data, they may neglect to protect the privacy of another equally important information, the tagged location information.
Intending to cover the existing gap regarding behavioral datasets modelling interactions of users with individual a multiple devices in Smart Office to later authenticate them continuously, we publish the following collection of datasets, which has been generated after having five users interacting for 60 days with their personal computer and mobile devices. Below you can find a brief description of each dataset.
While social media has been proved as an exceptionally useful tool to interact with other people and massively and quickly spread helpful information, its great potential has been ill-intentionally leveraged as well to distort political elections and manipulate constituents. In the paper at hand, we analyzed the presence and behavior of social bots on Twitter in the context of the November 2019 Spanish general election.
Data have been exported in three formats to provide the maximum flexibility:
- MongoDB Dump BSONs
- To import these data, please refer to the official MongoDB documentation.
- JSON Exports
- Both the users and the tweets collections have been exported as canonical JSON files.
- CSV Exports (only tweets)
- The tweet collection has been exported as plain CSV file with comma separators.
Cyber-Physical Production Systems (CPPS) are the key enabling for industrial businesses and economic growth. The introduction of the Internet of Things (IoT) in industrial processes represents a new Internet revolution, mostly known as 4th Industrial Revolution, towards the Smart Manufacturing concept. Despite the huge interest from the industry side to innovate their production systems, in order to increase revenues at lower costs, the IoT concept is still immature and fuzzy, which increases security related risks in industrial systems.
The generation of the dataset containing OPC UA traffic was possible due to the setup and execution of a laboratory CPPS testbed. This CPPS uses OPC UA standard for horizontal and vertical communications.Regarding the CPPS testbed setup, it consists on seven nodes in the network.Each network node consist on a Raspberry Pi device, running the Python FreeOpcUa implementation. In this configuration, there are two production units, each one containing three devices, and one node representing a Manufacturing Execution System (MES). Each device implements both OPC UA server and client, where the server publish to a OPC UA variable updates regarding sensor readings and the client subscribes all OPC UA variables from all other devices in the same production unit. On the other side, the MES only implements the OPC UA client, which subscribes all OPC UA variables from all devices in both production units. Also, connected to this network, is an attack node as it is assumed that the attacker already gained access to the CPPS network.After setting up the CPPS testbed, a python implementation that implements Tshark was used to capture OPC UA packets and export this traffic to a csv file format dataset. This traffic includes both normal and anomalous behaviour. Anomalous behaviour is achieved with the malicious node, which injects attacks into the CPPS network, targeting one or more device nodes and the MES. The attacks selected for the malicious activities are:
- Denial of Service(DoS);
- Eavesdropping or Man-in-the-middle (MITM) attacks;
- Impersonation or Spoofing attacks.
To perform the attacks mentioned, a python script is used, which implements the Scapy module for packet sniffing, injection and modification. Regarding the dataset generation, another python script, that implements Tshark (in this case Pyshark) was used to capture only OPC UA packets and export this traffic to a csv file format dataset. Actually, the OPC UA packets are converted to bidirectional communication flows, which are characterized by the following 32 features:
- src_ip: Source IP address;
- src_port: Source port;
- dst_ip: Destination IP address;
- dst_port: Destination port;
- flags: TCP flag status;
- pktTotalCount: Total packet count;
- octetTotalCount: Total packet size;
- avg_ps: Average packet size;
- proto: Protocol;
- service: OPC UA service call type;
- service_errors: Number of service errors in OPC UA request responses;
- status_errors: Number of status errors in OPC UA request responses;
- msg_size: OPC UA message transport size;
- min_msg_size: minimum OPC UA message size;
- flowStart: Timestamp of flow start;
- flowEnd: Timestamp of flow end;
- flowDuration: Flow duration in seconds;
- avg_flowDuration: Average flow duration in seconds;
- flowInterval: Time interval between flows in seconds;
- count: Number of connections to the same destination host as the current connection in the past two seconds;
- srv_count: Number of connections to the same port number as the current connection in the past two seconds;
- same_srv_rate: The percentage of connections that were to the same port number, among the connections aggregated in Count;
- dst_host_same_src_port_rate: The percentage of connections that were to the same source port, among the connections having the same port number;
- f_pktTotalCount: Total forward packets count;
- f_octetTotalCount: Total forward packets size;
- f_flowStart: Timestamp of first forward packet start;
- f_rate: Rate at which forward packets are transmitted;
- b_pktTotalCount: Total backwards packets count;
- b_octetTotalCount: Total backwards packets size;
- b_flowStart: Timestamp of first backwards packet start;
- label: Binary label classification;
- multi_label: Multi classification labeling.
The generated dataset has 33.567 normal instances, 74.013 DoS attack instances, 50 impersonation attack instances, and 7 MITM attack instances. This gives a total of 107.634 instances. Also, all attacks were grouped into one class (anomaly - 1) and the rest of the instances belong to the normal class (0).
For more information, please contact the author: Rui Pinto (email@example.com).
A simple dataset that gives the processing cost (in cycles) for verifying multiple messages signed with ECDSA and implicitly certified public keys. It considers two implicit certification models: ECQV and SIMPL.
This dataset is used in article "Schnorr-based implicit certification: improving the security and efficiency of vehicular communications", submitted to IEEE Transactions on Computers. Namely, it is used as basis for building that article's Figure 2.
GPS spoofing and jamming are common attacks against the UAV, however, conducting these experiments for research can be difficult in many areas. This dataset consists of a logs from a benign flight as well as one where the UAV experiences GPS spoofing and jamming. The Keysight EXG N5172B signal generator is used to provide the true coordinates as a location in Shanghai, China.
PX4 Autopilot v1.11.3 (https://px4.io) is used for all experiments, running on Pixhawk 4 flight controller (PX4_FMU_V5) and Pixhawk GPS receiver. The UAV frame is the Holybro S500. QGroundControl (v4.0.9) is used for GCS (http://qgroundcontrol.com).
Full flight data is contained in ULOG files (https://dev.px4.io/v1.9.0/en/log/ulog_file_format.html)
CSV files are obtained by conversion using the ulog2csv script (https://github.com/PX4/pyulog/blob/master/pyulog/ulog2csv.py)