Security
This FFT-75 dataset contains randomly sampled, potentially overlapping file fragments from 75 popular file types (see details below). It is the most diverse and balanced dataset available to the best of our knowledge. The dataset is labeled with class IDs and is ready for training supervised machine learning models. We distinguish 6 different scenarios with different granularity and provide variants with 512 and 4096-byte blocks. In each case, we sampled a balanced dataset and split the data as follows: 80% for training, 10% for testing and 10% for validation.
- Categories:
Modern technologies have made the capture and sharing of digital video commonplace; the combination of modern smartphones, cloud storage, and social media platforms have enabled video to become a primary source of information for many people and institutions. As a result, it is important to be able to verify the authenticity and source of this information, including identifying the source camera model that captured it. While a variety of forensic techniques have been developed for digital images, less research has been conducted towards the forensic analysis of videos.
- Categories:
Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user's computer and the secure network.
- Categories:
Data for the article in the Transactions on Industrial Informatics
- Categories:
The steganography and steganalysis of audio, especially compressed audio, have drawn increasing attention in recent years, and various algorithms are proposed. However, there is no standard public dataset for us to verify the efficiency of each proposed algorithm. Therefore, to promote the study field, we construct a dataset including 33038 stereo WAV audio clips with a sampling rate of 44.1 kHz and duration of 10s. And, all audio files are from the Internet through data crawling, which is for a better simulation of a real detection environment.
- Categories:
The data contains BTC Blockchain ledger of 100M, from February 3, 2009 to April 25, 2011. It also includes analysis results of the ledger: 10 characteristic indicators and 2 money laundering models.
- Categories:
This work intend to identify characteristics in network traffic that are able to distinguish the normal network behavior from denial of service attacks. One way to classify anomalous traffic is the data analysis of the packets header. This dataset contains labeled examples of normal traffic (23.088 instances), TCP Flood attacks (14.988 instances), UDP Flood (6.894 instances), HTTP Flood (347 instances) and HTTP Slow (183 instances) distributed in 73 numeric variables.
- Categories: