PWNJUTSU dataset

Citation Author(s):
CentraleSupélec, Inria, Univ Rennes, CNRS, IRISA, F-35042 Rennes, France
Sorbonne Université, CNRS, LIP6, F-75005 Paris, France
CentraleSupélec, Inria, Univ Rennes, CNRS, IRISA, F-35042 Rennes, France
Univ Rennes, Inria, CentraleSupélec, CNRS, IRISA, F-35042 Rennes, France
Submitted by:
Aimad berady
Last updated:
Fri, 06/24/2022 - 16:37
Data Format:
Link to Paper:
0 ratings - Please login to submit your rating.


Identifying patterns in the modus operandi of attackers is an essential requirement in the study of Advanced Persistent Threats. Previous studies have been hampered by the lack of accurate, relevant, and representative datasets of current threats. System logs and network traffic captured during attacks on real companies’ information systems are the best data sources to build such datasets. Unfortunately, for apparent reasons of companies’ reputation, privacy, and security, such data is seldom available. This dataset is the result of an alternative approach to such issues involved with collecting data. In the PWNJUTSU experiment, 22 Red Teamers attacked the vulnerable infrastructure to compromise machines and steal secret flags. Each Red Teamer operated on a dedicated instance. Sensors captured system logs and network traffic on each of these instances.


Sensors deployed on PWNJUTSU infrastructure recorded events. We extracted those events, and we made them available to the scientific community in the form of a downloadable dataset. This data can also be consulted on our project website using the search engine provided. In this dataset, for each of the 22 participants of the experiment, sensors produced the following files:

  • JSON Lines file containing system logs from the three vulnerable machines. Overall this represents a total of more than 16 million event logs (n*-vm1: 9.2M events, n*-vm2: 50k events, n*-vm3: 7.2M events).
  • PCAP files containing raw network traffic captured on both inbound and outbound interfaces. Overall this represents a total of 172 GB of raw data and 17 GB of Zeek analysis results, corresponding to 45 million lines.

Additionally, we produced a reference (n99) which is the same instance infrastructure monitored but without any malicious activity.