Due to the large number of vulnerabilities in information systems and the continuous activity of attackers, techniques for malicious traffic detection are required to identify and protect against cyber-attacks. Therefore, it  is important to intentionally operate a cyber environment to be invaded and compromised in order to allow security professionals to analyze the evolution of the various attacks and exploited vulnerabilities.

This dataset includes 2016, 2017 and 2018 cyber attacks in the HoneySELK environment.


PCAPs contain attacks targeting several honeypots configured with the following protocols/ports:

  - SSH: 22/TCP

  - HTTP: 80/TCP

  - HTTPS: 443/TCP

  - MYSQL: 3306/TCP

  - FTP: 21/20/TCP

  - DNS: 53/TCP/UDP

  - NTP: 123/UDP

  - TELNET: 23/TCP

  - MSRPC: 135/TCP




This dataset contains a list of popular websites and their privacy statements. The websites belong to the three largest South Asian economies, namely, India, Pakistan, and Bangladesh. Each website is categorized into 10 sectors, namely, e-commerce, finance/banking, education, healthcare, news, government, telecom, buy and sell, job/freelance, blogging/discussion. We hope that this dataset will help researchers in investigating website privacy compliance.


We build an original dataset of thermal videos and images that simulate illegal movements around the border and in protected areas and are designed for training machines and deep learning models. The videos are recorded in areas around the forest, at night, in different weather conditions – in the clear weather, in the rain, and in the fog, and with people in different body positions (upright, hunched) and movement speeds (regu- lar walking, running) at different ranges from the camera.



About 20 minutes of recorded material from the clear weather scenario, 13 minutes from the fog scenario, and about 15 minutes from rainy weather were processed. The longer videos were cut into sequences and from these sequences individual frames were extracted, resulting in 11,900 images for the clear weather, 4,905 images for the fog, and 7,030 images for the rainy weather scenarios.

A total of 6,111 frames were manual annotated so that could be used to train the supervised model for person detection. When selecting the frames, it was taken into account that the selected frames include different weather conditions so that in the set there were 2,663 frames shot in clear weather conditions, 1,135 frames of fog, and 2,313 frames of rain.

The annotations were made using the open-source Yolo BBox Annotation Tool that can simultaneously store annotations in the three most popular machine learning annotation formats YOLO, VOC, and MS COCO so all three annotation formats are available. The image annotation consists of a centroid position of the bounding box around each object of interest, size of the bounding box in terms of width and height, and corresponding class label (Human or Dog).



Presented here is a dataset used for our SCADA cybersecurity research. The dataset was built using our SCADA system testbed described in our paper below [*]. The purpose of our testbed was to emulate real-world industrial systems closely. It allowed us to carry out realistic cyber-attacks.



Provided dataset is cleased, pre-processed, and ready to use. The users may modify as they wish, but please cite the dataset as below.

M. A. Teixeira, M. Zolanvari, R. Jain, "WUSTL-IIOT-2018 Dataset for ICS (SCADA) Cybersecurity Research," 2018. [Online]. Available: https://www.cse.wustl.edu/~jain/iiot/index.html.


It is the HDL files with a submisstion to the IEEE journal.

Last Updated On: 
Fri, 05/01/2020 - 05:55

This repository contains the results of running more than 70 samples of ransomware, from different families, dating  since 2015. It contains the network traffic (DNS and TCP) and the Input/Output (I/O) operations generated by the malware while encrypting a network shared directory. These data are contained in three files for each ransomware sample: one with the information from the DNS requests, other with the TCP connections another one containing the I/O operations. This information can be useful for testing new and old ransomware detection tools and compare their results.


The dataset is organised as one zip file for all text files organised in one directory for each ransomware sample. Although another zip file could be uploaded with all the trace files organised in the same manner as the previous zip file, it was extremely large file (more than 650GB after compression). In order to make the download easier, we have uploaded the trace files in separated zip files, one for each directory or scenario. We have also published in an external website (link) the trace files available to download individually. If a single trace file download is desired, we recommend to visit the website and download it.

For each malware sample three text files are generated (dnsInfo.txt, TCPconnInfo.txt and IOops.txt) and placed in the directory with the ransomware strain’s name. Structure of all the directory and subdirectories are shown in README.pdf file and in the text file “repositoryStructure.txt”.

The I/O operations file contains one text line for each operation (open or close file, read, write, rename, delete, etc). Each line contains fields separated by the blank space character (ASCII 0x20), with the useful metadata about the operation (file name, read and write offset and length, timestamp, etc). The file README.pdf explains all the fields in the I/O operations file.

The DNS info file has one line per each DNS request made by the user machine. The DNS server is ‘’ for all traces. The file README.pdf explains each column. The TCP info file has one line per each TCP connection. In case the connection contains a HTTP request, the method, response code and url are present in this file. As in previous cases, in the README file the columns and structure of file is explained.

We started downloading ransomware samples in 2015 from hybrid-analysis.com and malware-traffic-analysis.com. The samples were executed in one machine and the DNS and HTTP petitions were collected by a traffic probe mirroring the traffic. The ‘infected’ machine has a mounted directory, shared by a server. The content of this directory is encrypted by the ransomware during its activity. The operations over this directory were captured by the same traffic probe and processed with specialised software to extract the I/O operations in the format explained in the README.pdf file.

In order to analyse the ransomware behaviour, we made different shared directories and we ran some samples in both directories. These shared directories follow an statistic distribution for the file sizes and location of each one, trying to simulate users’ fileset. Changing the seed in the generation of the directory we can make similar directories with different number of files, distribution and subdirectories. The trace files of ransomwares run in this cases can be found in zip files named ‘5GvXdirectory.zip’ where X goes from 2 to 10. We have also run samples with shared directory of 10GB size, which trace files are placed in zip file called ‘10Gdirectory’.

We have also run one sample sweeping the network speed for simulating ransomware encrypting the files slowly. These traces can be found in ‘networkSpeed.zip’ file. Finally, the samples run in scenario with Windows 10 user and server generated traffic traces placed in the file ‘W10scenario.zip’. There is not text files for these samples as the traffic is encrypted in the version 3 of SMB protocol (used in Windows 10 machines).

As we have explained above, the traces files can be downloaded individually from an external link but the text files associated to them are placed in a single zip file (it is possible to download them all together due to its smaller size).


Five well-known Border Gateway Anomalies (BGP) anomalies:
WannaCrypt, Moscow blackout, Slammer, Nimda, Code Red I, occurred in May 2017, May 2005, January 2003, September 2001, and July 2001, respectively.
The Reseaux IP Europeens (RIPE) BGP update messages are publicly available from the Network Coordination Centre (NCC) and contain:
WannaCrypt, Moscow blackout, Slammer, Nimda, Code Red I, and regular data: https://www.ripe.net/analyse/.


Raw data from the "route collector rrc 04" are organized in folders labeled by the year and month of the collection date.
Complete datasets for WannaCrypt, Moscow blackout, Slammer, Nimda, and Code Red I are available from the RIPE route collector rrc 04 site:
RIPE NCC: https://www.ripe.net
Analyze: https://www.ripe.net/analyse
Internet Measurements: https://www.ripe.net/analyse/internet-measurements
Routing Information Service (RIS): https://www.ripe.net/analyse/internet-measurements/routing-information-s...
RIS Raw Data: https://www.ripe.net/analyse/internet-measurements/routing-information-s...
rrc04.ripe.net: data.ris.ripe.net/rrc04/
The date of last modification and the size of the datasets are also included.

BGP update messages are originally collected in multi-threaded routing toolkit (MRT) format.
"Zebra-dump-parser" written in Perl is used to extract to ASCII the BGP updated messages.
The 37 BGP features were extracted using a C# tool to generate uploaded datasets (csv files).
Labels have been added based on the periods when data were collected.


1.       Folder “1.SlibCrypto 4DIAC project” includes 4DIAC (1.10.3) project that can be added into the workspace of 4DIAC. It contains the various standalone applications representing security mechanisms and the extended one shown in Fig. 5.



BS-HMS-Dataset is a dataset of the users' brainwave signals and the corresponding hand movement signals from a large number of volunteer participants. The dataset has two parts; (1) Neurosky based Dataset (collected over several months in 2016 from 32 volunteer participants), and (2) Emotiv based Dataset (collected from 27 volunteer participants over several months in 2019). 


There are two folders under each user; session I and sessions II. Each session folder contains four different folders; one for each activity performed by the user. Each activity folder contains .csv files; (1) EEG Data (brainwave.csv), (2) Handmovement Accelerometer Data (accelerometer.csv), and (3) Handmovement Gyroscope Data (gyroscope.csv).

A more deatailed description of the data is given in BS-HMS-Dataset-Documentation.pdf file.

Acknowledgement: This data collection was supported in part by the National Science Foundation (NSF) under grant SaTC-1527795.

Please cite: [1] Diksha Shukla, Sicong Chen, Yao Lu, Partha Pratim Kundu, Ravichandra Malapati, Sujit Poudel, Zhanpeng Jin, Vir Phoha, "Brain Signals and the Corresponding Hand Movement Signals Dataset (BS-HMS-Dataset)", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/my1k-dd23. Accessed: Dec. 05, 2019.