Cyber Threat Intelligent (CTI) dataset generated from public security reports and malware repositories

Citation Author(s):
Daegeon
Kim
Korea University
Huy Kang
Kim
Korea University
Submitted by:
Daegeon Kim
Last updated:
Sat, 01/22/2022 - 01:33
DOI:
10.21227/dpat-qd69
Data Format:
Links:
License:
8045 Views
Categories:
Keywords:
5
1 rating - Please login to submit your rating.

Abstract 

This dataset contains Cyber Threat Intelligence (CTI) data generated from public security reports and malware repositories.

The dataset is stored in a structured format (XML) and includes approximately 640,000 records from 612 security reports published from January 2008 to June 2019.

Several data types are contained in this dataset such as URL, host, IP address, e-mail account, hashes (MD5, SHA1, and SHA256), common vulnerabilities and exposures (CVE), registry, file names ending with specific extensions, and the program database (PDB) path.

Instructions: 

For more instruction about the dataset as well as the system generating the dataset, please see following paper:

Daegeon Kim and Huy Kang Kim, “Automated Dataset Generation System for Collaborative Research of Cyber Threat Analysis,” Security and Communication Networks, vol. 2019, Article ID 6268476, 10 pages, 2019. https://doi.org/10.1155/2019/6268476.

Comments

pl provide access

Submitted by prabhjot kaur on Wed, 03/03/2021 - 08:42

Hello,
why is it specified that the dataset is in JSON format, but the one available for download is in XML format?
thank you

Submitted by Romeo Bigodo Ngueyep on Thu, 05/27/2021 - 16:41

It should be on this page - right-hand side; ctrl-f "CTIDataset.zip" (no quotes - and go to the second result now that this comment will steal the first one)

Submitted by A Miller on Thu, 05/27/2021 - 16:52

Sorry, but this dataset is not in Json format.
Please rephrase this part of the dataset description.
Thank you

Submitted by Romeo Bigodo Ngueyep on Thu, 11/18/2021 - 17:30

Correct!
The dataset format is XML.
The description is modified now.
Thank you.

Submitted by Daegeon Kim on Sat, 01/22/2022 - 01:39

Correct!
The dataset format is XML.
The description is modified now.
Thank you.

Submitted by Daegeon Kim on Sat, 01/22/2022 - 01:37

hello, the data set of "CTIDataset.zip" is in xml format, I connot find the json format. If possible, can I send a copy to my mailbox? 1120382898@qq.com Thank you

Submitted by chen jerry on Sun, 10/10/2021 - 03:50

You can easily change the format from XML to JSON using free converters.

Submitted by Daegeon Kim on Sat, 01/22/2022 - 01:46

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.