Advanced Persistent Threat (APT) Classified Dataset

Citation Author(s):: Hassan Mahmood (Air University Islamabad)
Submitted by:: Hassan Mahmood
Last updated:: Mon, 05/06/2024 - 19:41
DOI:: 10.21227/fav6-4j56
Data Format:: Image file in .png

Image files (.png)

Image files

1921 views

Categories:

Keywords:

zero-shot learning

Advanced Persistent Threat

Cyber Attacks

Malware Classification

PE Malware

Machine Learning

Deep Learning

artificial intelligence

ACCESS DATASET CITE

Abstract

In deep learning, images are utilized due to their rich information content, spatial hierarchies, and translation invariance, rendering them ideal for tasks such as object recognition and classification. The classification of malware using images is an important field for deep learning, especially in cybersecurity. Within this context, the Classified Advanced Persistent Threat Dataset is a thorough collection that has been carefully selected to further this field's study and innovation. This dataset comprises distinct subsets: one containing samples attributed to twelve prominent APT groups, and another cataloging yearly APT samples spanning from 2011 to 2023. Each subset offers a unique insight into cyber threats, providing researchers with diverse opportunities for analysis and exploration. Employing the innovative Ahash technique, the samples are intricately categorized into subclasses, laying the groundwork for in-depth study and investigation. With a primary focus on advancing malware classification methodologies, particularly through image-based deep learning approaches, this dataset serves as a vital resource for fortifying cybersecurity defenses against the evolving landscape of cyber threats.

Instructions:

The dataset is packaged in a ZIP file, secured with the password "infected". Upon extraction, the main folder categorizes samples into groups named after respective APT groups or the collection year of APT samples. Within each group folder, further sub-folders employ the Ahash technique for subclassification, offering a granular organization of the samples. This hierarchical structure ensures ease of navigation and efficient access to specific subsets for analysis and research purposes.