Datasets
Standard Dataset
A semi real dataset of Meta-Alerts for APT attack detection
- Citation Author(s):
- Submitted by:
- Behrouz Tork Ladani
- Last updated:
- Tue, 05/17/2022 - 22:17
- DOI:
- 10.21227/bssq-k752
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
One of the big challenges for evaluating the proposed methods for detection of Advanced Persistent Threat (APT) based attacks is that providing real environments and even real datasets containing logs or events related to APT attacks is very hard. Hence designing a sound evaluation procedure is a real challenge. As far as we know, there is currently no datasets with labeled records of APT attacks. In fact, specific APT related data is very limited and it is very hard to construct real world datasets. This is mostly due to the nature of APT attacks: they are usually multi-stage and hybrid, designed for specific purposes, normally generate low risk alerts and hence because of their low risk level do not considered by detection and monitoring systems, and generally disappear or change after revealing or identifying their C&C centers. For these reasons most of the works on detecting APT based attacks use their own synthesized datasets instead of the real world data.
To response the mentioned challenge, we tried to provide a semi-real datasets that is a composition of a real world APT attack free alert dataset with an extracted dataset from a real world APT attack scenarios description. We call it semi real because it is synthesized from two real world datasets.
For generating a dataset of meta-alerts containing labeled Intrusion Kill Chain (IKC) meta-alerts and intact meta-alerts, we took some real world alerts generated by a commercial SIEM to create an APT attack free meta-alert dataset and then randomly injected some APT attack meta-alerts derived from another dataset of real world APT attack traces as labeled APT attack meta-alerts. We use a dataset of attack free meta-alerts produced by Ravin, a SIEM system developed by PayamPardaz Company. The considered dataset is the output of running Ravin in a company with 250 hosts containing 667K alerts generated for 3.1M investigated logs for a period of 3 days during September 23th to September 26th 2018. Regarding the available information during and after the mentioned period, and the type of functionality of the company, we sure that there was no APT attack in progress during this period. Therefore we assume that these alerts although may contain conventional attacks but is APT attack free. Therefore we assume that these alerts although may contain conventional attacks but is APT attack free.
We have used the report of the third TC adversarial engagement program (2018) named as TA5.1 Ground Truth Report Engagement 3. The published report includes 27 APT attack scenarios conducted during the period of April 6th to April 13th 2018 including typical APT activities such as browser-induced drive-by initial compromises, backdoor injection, privilege escalation, internal reconnaissance, exfiltration of sensitive assets, and cleanup of attack footprints. In these attacks, sophisticated attack vectors such as reflective loading, web-shell capabilities, and in-memory module loading were used.
We have a 7 days period of attacks in TA5.1 scenarios and randomly injected the corresponding IKCs into the Ravin meta-alerts dataset (which is 3 days) assuming that a host has been randomly selected and is a candidate to inject an attack vector scenario. The following steps were conducted for this purpose:
1. A random attack scenario in TA5.1 is selected and its corresponding IKC is derived.
2. The meta-alert time values in all the derived meta-alerts of the selected attack scenario are adopted. For this purpose, depending the time frame in which the attack will take place on the selected host, we randomly generate the time for the TA5.1 attack vector and apply them to the records respectively. For example, if the attack is to take place within 2 days, we will generate a random time interval of 2 days for each meta-alert.
3. The meta-alert destination IP address values in all the derived meta-alerts of the selected attack scenario are adopted. All of them are changed to the selected host IP address to which we are going to inject the attack.
Normal
0
false
false
false
EN-US
X-NONE
AR-SA
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:8.0pt;
mso-para-margin-left:0in;
line-height:107%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:Arial;
mso-bidi-theme-font:minor-bidi;}
The dataset contains 250 text files. Each file represents the meta-alert records for a host in the system. Each meta-alert record contains Alert Creation Time, Alert Impact Severity, Source Address, Source port, Target Address, Target Port, and IKC Step.
Normal
0
false
false
false
EN-US
X-NONE
AR-SA
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:8.0pt;
mso-para-margin-left:0in;
line-height:107%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:Arial;
mso-bidi-theme-font:minor-bidi;}
Comments
i am studing for apt, I hope for the datasets.
i am studing for apt, I hope for the datasets.