This dataset is part of my Master's research on malware detection and classification using the XGBoost library on Nvidia GPU. The dataset is a collection of 1.55 million of 1000 API import features extract from jsonl format of the EMBER dataset 2017 v2 and 2018. All data is pre-processing, duplicated records are removed. The dataset contains 800,000 malware and 750,000 "goodware" samples.



Column name:  sha256

Description: SHA256 hash of the example

Type: string


Column name:  appeared

Description: appeared date of the sample

Type: date (yyyy-mm format)


Column name:  label

Description: specify malware or "goodware" of the sample

Type: 0 ("goodware") or 1 (malware)


Column name: GetProcAddress

Description: Most imported function (1st)

Type: 0 (Not imported) or 1 (Imported)



Column name: LookupAccountSidW

Description: Least imported function (1000th)

Type: 0 (Not imported) or 1 (Imported)


The full dataset features header can be downloaded at

All processing code will be uploaded to


Three well-known Border Gateway Anomalies (BGP) anomalies:
WannaCrypt, Moscow blackout, and Slammer, occurred in May 2017, May 2005, and January 2003, respectively.
The Route Views BGP update messages are publicly available from the University of Oregon Route Views Project and contain:
WannaCrypt, Moscow blackout, and Slammer:


Raw data from the "route collector route-views2" are organized in folders labeled by the year and month of the collection date.
Complete datasets for WannaCrypt, Moscow blackout, and Slammer are available from the Route Views route collector route-views2 site:
University of Oregon Route Views Project:
Route Views Collector Map:
University of Oregon Route Views Archive Project:
MRT format RIBs and UPDATEs (quagga bgpd, from
The date of last modification and the size of the datasets are also included.

BGP update messages are originally collected in multi-threaded routing toolkit (MRT) format.
"Zebra-dump-parser" written in Perl is used to extract to ASCII the BGP updated messages.
The 37 BGP features were extracted using a C# tool to generate uploaded datasets (csv files).
Labels have been added based on the periods when data were collected.


As an alternative to classical cryptography, Physical Layer Security (PhySec) provides primitives to achieve fundamental security goals like confidentiality, authentication or key derivation. Through its origins in the field of information theory, these primitives are rigorously analysed and their information theoretic security is proven. Nevertheless, the practical realizations of the different approaches do take certain assumptions about the physical world as granted.


The data is provided as zipped NumPy arrays with custom headers. To load an file the NumPy package is required.

The respective loadz primitive allows for a straight forward loading of the datasets.

To load a file “file.npz” the following code is sufficient:

import numpy as np

measurement = np.load(’file.npz ’, allow pickle =False)

header , data = measurement [’header ’], measurement [’data ’]

The dataset comes with a supplementary script illustrating the basic usage of the dataset.


Design and fabrication outsourcing has made integrated circuits vulnerable to malicious modifications by third parties known as hardware Trojan (HT). Over the last decade, the use of side-channel measurements for detecting the malicious manipulation of the chip has been extensively studied. However, the suggested approaches mostly suffer from two major limitations: reliance on trusted identical chip (e.i. golden chip); untraceable footprints of subtle hardware Trojans which remain inactive during the testing phase.


See the attached document.


This dataset was created for the following paper: Seonghoon Jeong, Boosun Jeon, Boheung Chung, and Huy Kang Kim, "Convolutional neural network-based intrusion detection system for AVTP streams in automotive Ethernet-based networks," Vehicular Communications, DOI: 10.1016/j.vehcom.2021.100338.



The following devices are connected to the automotive Ethernet testbed:

  • a RAD-Galaxy: BroadR-Reach switch
  • two neoECU AVB/TSN (AVB/TSN Endpoint Simulation): configured as an AVB talker and an AVB listener, respectively
  • a RAD-Moon: a media converter (between BroadR-Reach and Ethernet)
  • an USB Camera connected to the AVB talker

The dataset contains four benign (attack-free) packet captures. 

  • driving_01_original.pcap (about 10 min)
  • driving_02_original.pcap (about 16 min)
  • indoors_01_original.pcap (about 24 min)
  • indoors_02_original.pcap (about 21 min)


We suppose that an attacker injects arbitrary stream AVTP data units (AVTPDUs) into the IVN. The goal of the attacker is to output a single video frame, at a terminal application connected to the AVB listener, by injecting previously generated AVTPDUs during a certain period. To demonstrate the attack, we extract 36 continuous stream AVTPDUs (single-MPEG-frame.pcap) from one of our AVB datasets; the extracted AVTPDUs constitute one video frame. Then, the attacker performs a replay attack by sending the 36 stream AVTPDUs repeatedly. Check *_injected.pcap files for the result of the replay attack.


To open the packet captures, we recommend researchers use Wireshark and the following plug-ins:



This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00312, Developing technologies to predict, detect, respond, and automatically diagnose security threats to automotive Ethernet-based vehicle).



This is the dataset provided and collected while "Car Hacking: Attack & Defense Challenge" in 2020. We are the main organizer of the competition along with Culture Makers and Korea Internet & Security Agency. We are very proud of releasing these valuable datasets for all security researchers for free.

The competition aimed to develop attack and detection techniques of Controller Area Network (CAN), a widely used standard of in-vehicle network. The target vehicle of competition was Hyundai Avante CN7.


1. Description

RoundTypeDescription# Normal# Attack# Rows
PreliminaryTrainingNormal and four types of attacks dataset with class3,372,743299,4083,672,151
SubmissionNormal and four types of attacks dataset with class
(during the competition, without class)
FinalSubmissionNormal and five attacks (4 spoofings, 1 fuzzing) dataset with class
(during the competition, without class)
  • Preliminary round contains two status of the vehicle -- S: Stationary, D: Driving.
    In final round, only stationary status traffic was collected for safety reason.

  • All csv files have same headers: Timestamp (logging time), Arbitration_ID (CAN identifier), DLC (data length code), Data (CAN data field), Class (Normal or Attack), and SubClass (attack type) of each CAN message.


2. Class

Normal: Normal traffic in CAN bus.

Attack: Attack traffic injected. Four types of attacks are included -- Flooding, Spoofing, Replay, Fuzzing.

  • Flooding: Flooding attack aims to consume CAN bus bandwidth by sending a massive number of messages.

  • Spoofing: CAN messages are injected to control certain desired function.

  • Replay: Replay attack is to extract normal traffic at a specific time and replay (inject) it into the CAN bus.

  • Fuzzing: Random messages are injected to cause unexpected behavior of the vehicle.


3. Acknowledgement

This work was supported by Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00866, Challenges for next generation security R&D).


Bluetooth communication is widely adopted in IoMT devices due to its various benefits. Nevertheless, because of its simplicity as a personal wireless communication protocol, Bluetooth lacks the security mechanisms which may result in devastating outcomes for patients treated using wireless medical devices.


Figure Data Backup for paper, "Co-Evolution of Malware Threats in the U.S. Commercial Sector and Defense Industrial Base"


Producing secure software is challenging. The poor usability

of security Application Programming Interfaces (APIs) makes this even

harder. Many recommendations have been proposed to support developers

by improving the usability of cryptography libraries and APIs; rooted in

wider best practice guidance in software engineering and API design. In

this SLR, we systematize knowledge regarding these recommendations.


The dataset contains memory dump data which is generated continuously. For the experiment we carried out, we implemented the volatile data dump module which generated around 360 VM memory dump images of average size 800Mb each (Total 288GB). These data files are compressed using gzip utility. Further zipped to 79.5GB one single file of memory evidence.
Out of these preserved and stored memory dump dataset, 79 files of size 17.3GB were generated during the attack. This means the data 21.76% of data (in size) is potential evidence.