This dataset contains road network information of Chengdu with travel time data during four time slots: weekday peak hour, weekday off-peak hour, weekend peak hour and weekend off-peak hour.


This is the dataset provided and collected while "Car Hacking: Attack & Defense Challenge" in 2020. We are the main organizer of the competition along with Culture Makers and Korea Internet & Security Agency. We are very proud of releasing these valuable datasets for all security researchers for free.

The competition aimed to develop attack and detection techniques of Controller Area Network (CAN), a widely used standard of in-vehicle network. The target vehicle of competition was Hyundai Avante CN7.


1. Description

RoundTypeDescription# Normal# Attack# Rows
PreliminaryTrainingNormal and four types of attacks dataset with class3,372,743299,4083,672,151
SubmissionNormal and four types of attacks dataset with class
(during the competition, without class)
FinalSubmissionNormal and five attacks (4 spoofings, 1 fuzzing) dataset with class
(during the competition, without class)
  • Preliminary round contains two status of the vehicle -- S: Stationary, D: Driving.
    In final round, only stationary status traffic was collected for safety reason.

  • All csv files have same headers: Timestamp (logging time), Arbitration_ID (CAN identifier), DLC (data length code), Data (CAN data field), Class (Normal or Attack), and SubClass (attack type) of each CAN message.


2. Class

Normal: Normal traffic in CAN bus.

Attack: Attack traffic injected. Four types of attacks are included -- Flooding, Spoofing, Replay, Fuzzing.

  • Flooding: Flooding attack aims to consume CAN bus bandwidth by sending a massive number of messages.

  • Spoofing: CAN messages are injected to control certain desired function.

  • Replay: Replay attack is to extract normal traffic at a specific time and replay (inject) it into the CAN bus.

  • Fuzzing: Random messages are injected to cause unexpected behavior of the vehicle.


3. Acknowledgement

This work was supported by Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00866, Challenges for next generation security R&D).


Data for outlier test


We disclose a traffic landmark dataset for detection.The dataset generated with our framework includes about 150,000 images and annotations of about 470,000 traffic landmarks.Our dataset was collected in an urban area of Seoul and suburban areas of Suwon, Hwaseong, Yongin, and Seongnam in South Korea at different times of the day.Images taken in the morning or evening included a large number of saturated areas due to exposure to direct sunlight.Most images taken under the light condition of the late evening was low-contrast.The images taken at noon included the reflection of the windshield


Training, Test, and Validation data pertaining to the real-time packet data captured in Sonic Firewall is attached herewith.


Open dataset from Machine Learning Repository of Center for Machine Learning and Intelligent Systems at the University of California, Irvine.


Data set of 26/11 Mumbai attack is based on Mumbai Terrorist Attacks 2008 India Ministry of External Affairs Dossier and News reports. 10 terrorist operated in India distributed in five sub-groups, simultaneously 3 other person comes in light as per report those were having in continue touch with these terrorist from Pakistan and giving them instructions.                                                                                  


The datasets consist of operational data and detailed information of three inverter transformers in a 3.275 MW PV plant in the outskirt of Brisbane, Australia. The data includes load current, top-oil temperature, moisture in top oil, ambient temperature, solar irradiance and individual current harmonics (up to 31st order). The time interval of the data is either 1 minute or 3 seconds (dependent on the data type). The data can be used to study the ageing of inverter transformers in this PV plant. 


Data for the study has been retrieved from a publicly available data set of a leading European P2P lending platform, Bondora ( The retrieved data is a pool of both defaulted and non-defaulted loans from the time period between 1st March 2009 and 27th January 2020. The data comprises demographic and financial information of borrowers and loan transactions. In P2P lending, loans are typically uncollateralized and lenders seek higher returns as compensation for the financial risk they take.


The dataset also consists of data preprocessing Jupyter notebook that will help in working with the data and to perform basic data pre-processing. The zip file of the dataset consists of pre-processed and raw dataset directly extracted from the Bondora website

In the attached notebook, I have used my intuition and assumption for performing data-preprocessing.



This dataset is a set of eighteen directed networks that represents message exchanges among Twitter accounts during eighteen crisis events. The dataset comprises 645,339 anonymized unique user IDs and 1,396,709 edges that are labeled with respect to Plutchik's basic emotions (anger, fear, sadness, disgust, joy, trust, anticipation, and surprise) or "neutral" (if a tweet conveys no emotion).