Name: Chinese cybersecurity event dataset
Creator: Bingzhi Xu
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence

Abstract

This paper introduces a new dataset named CSED, designed for Chinese cybersecurity ED. The dataset has collected approximately 18,000 news articles related to cybersecurity. We have drawn on the classification definitions of cybersecurity event types from the CAISE [38] , defining two event types: Attack and Vulnerability, and further subdividing them into nine sub-event types: Data Breach, Phishing, Ransom, DDoS Attack, Malware, Supply Chain, Vulnerability Impact, Vulnerability Discovery, and Vulnerability Patch. Additionally, sentences that do not contain any specific event are categorized as ‘NA’. The key to annotating cybersecurity event tasks is to identify trigger words; carefully selected trigger words can significantly enhance the efficiency of subsequent event recognition. We establish rules for the annotation process, selecting only the most representative event for annotation when a sentence contains multiple events of the different type. This approach avoids unnecessary redundancy and ensures a refined dataset. It includes 2054 event instances, 2 event types, and 9 sub-types.

Instructions:

{"id": "62831821e846a66a184aef35", "sentence": "就在国泰航空乘客数据泄露的消息传出几周前，英国航空(British Airways)透露，在两周内，数十万乘客的信用卡信息被盗.", "tokens": ["就", "在", "国泰", "航空", "乘客", "数据", "泄露", "的", "消息", "传出", "几周", "前", "，", "英国", "航空", "(", "British", " ", "Airways", ")", "透露", "，", "在", "两周", "内", "，", "数十万", "乘客", "的", "信用卡", "信息", "被盗", "."], "trigger": "数据泄露", "trigger_positions": [[8, 12]], "eventype": "数据泄露", "eventype_id": 0}

Comments

It is design for Chinese cybersecurity evnet detection

Submitted by Bingzhi Xu on Mon, 08/12/2024 - 10:31

Dataset Files

CSED.json (997.72 kB)

Datasets

Standard Dataset

Chinese cybersecurity event dataset

Abstract

Comments

Dataset Files

QUESTIONS?