PWNJUTSU dataset

Name: PWNJUTSU dataset
Creator: Aimad berady
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Security

Citation Author(s):: Aimad BERADY (CentraleSupélec, Inria, Univ Rennes, CNRS, IRISA, F-35042 Rennes, France)

Mathieu JAUME (Sorbonne Université, CNRS, LIP6, F-75005 Paris, France)

Valérie VIET TRIEM TONG (CentraleSupélec, Inria, Univ Rennes, CNRS, IRISA, F-35042 Rennes, France)

Gilles GUETTE (Univ Rennes, Inria, CentraleSupélec, CNRS, IRISA, F-35042 Rennes, France)
Submitted by:: Aimad berady
Last updated:: Fri, 06/24/2022 - 20:37
DOI:: 10.21227/ngjr-kh92
Data Format:: *.json

*.pcap
Research Article Link:: PWNJUTSU: A Dataset and a Semantics-Driven Approach to Retrace Attack Campaigns
Links:: Project website

1090 views

Categories:

Security

Keywords:

Advanced Persistent Threat (APT)

Tactics Techniques Procedures (TTP)

SIEM

Red Team

ACCESS DATASET CITE

Abstract

Identifying patterns in the modus operandi of attackers is an essential requirement in the study of Advanced Persistent Threats. Previous studies have been hampered by the lack of accurate, relevant, and representative datasets of current threats. System logs and network traffic captured during attacks on real companies’ information systems are the best data sources to build such datasets. Unfortunately, for apparent reasons of companies’ reputation, privacy, and security, such data is seldom available. This dataset is the result of an alternative approach to such issues involved with collecting data. In the PWNJUTSU experiment, 22 Red Teamers attacked the vulnerable infrastructure to compromise machines and steal secret flags. Each Red Teamer operated on a dedicated instance. Sensors captured system logs and network traffic on each of these instances.

Instructions:

Sensors deployed on PWNJUTSU infrastructure recorded events. We extracted those events, and we made them available to the scientific community in the form of a downloadable dataset. This data can also be consulted on our project website using the search engine provided. In this dataset, for each of the 22 participants of the experiment, sensors produced the following files:

JSON Lines file containing system logs from the three vulnerable machines. Overall this represents a total of more than 16 million event logs (n*-vm1: 9.2M events, n*-vm2: 50k events, n*-vm3: 7.2M events).
PCAP files containing raw network traffic captured on both inbound and outbound interfaces. Overall this represents a total of 172 GB of raw data and 17 GB of Zeek analysis results, corresponding to 45 million lines.

Additionally, we produced a reference (n99) which is the same instance infrastructure monitored but without any malicious activity.