Datasets
Standard Dataset
Chinese cybersecurity event dataset
- Citation Author(s):
- Submitted by:
- Bingzhi Xu
- Last updated:
- Mon, 08/12/2024 - 10:30
- DOI:
- 10.21227/z61y-9617
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This paper introduces a new dataset named CSED, designed for Chinese cybersecurity ED. The dataset has collected approximately 18,000 news articles related to cybersecurity. We have drawn on the classification definitions of cybersecurity event types from the CAISE [38] , defining two event types: Attack and Vulnerability, and further subdividing them into nine sub-event types: Data Breach, Phishing, Ransom, DDoS Attack, Malware, Supply Chain, Vulnerability Impact, Vulnerability Discovery, and Vulnerability Patch. Additionally, sentences that do not contain any specific event are categorized as ‘NA’. The key to annotating cybersecurity event tasks is to identify trigger words; carefully selected trigger words can significantly enhance the efficiency of subsequent event recognition. We establish rules for the annotation process, selecting only the most representative event for annotation when a sentence contains multiple events of the different type. This approach avoids unnecessary redundancy and ensures a refined dataset. It includes 2054 event instances, 2 event types, and 9 sub-types.
{"id": "62831821e846a66a184aef35", "sentence": "就在国泰航空乘客数据泄露的消息传出几周前,英国航空(British Airways)透露,在两周内,数十万乘客的信用卡信息被盗.", "tokens": ["就", "在", "国泰", "航空", "乘客", "数据", "泄露", "的", "消息", "传出", "几周", "前", ",", "英国", "航空", "(", "British", " ", "Airways", ")", "透露", ",", "在", "两周", "内", ",", "数十万", "乘客", "的", "信用卡", "信息", "被盗", "."], "trigger": "数据泄露", "trigger_positions": [[8, 12]], "eventype": "数据泄露", "eventype_id": 0}
Comments
It is design for Chinese cybersecurity evnet detection