Maldeb Dataset

Citation Author(s):
Setia Juli Irzal
Institut Teknologi Bandung, Telkom University
Submitted by:
Setia Ismail
Last updated:
Tue, 04/23/2024 - 00:16
Data Format:
Research Article Link:
0 ratings - Please login to submit your rating.


Image representation of Malware-benign dataset. The Dataset were compiled from various sources malware repositories:  The Malware-Repo, TheZoo,Malware Bazar, Malware Database, TekDefense. Meanwhile benign samples were sourced from system application of Microsoft 10 and 11, as well as open source software repository such as Sourceforge, PortableFreeware, CNET, FileForum. The samples were validated by scanning them using Virustotal Malware scanning services. The Samples were pre-processed by transforming the malware binary into grayscale images following rules from Nataraj (2011). Nataraj Paper: Malware and benign sample were collected by Debi Amalia Septiyani and Halimul Hakim Khairul D. A. Septiyani, “Generating Grayscale and RGB Images dataset for windows PE malware using Gist Features extaction method,” Institut Teknologi Bandung, 2022, and Dani Agung Prastiyo, "Design and implementation of a machine learning-based malware classification system with an audio signal feature Analysis Approach," Institut Teknologi Bandung, 2023


This dataset contains 20,854 samples: consist of two classes, malware and benign. Malware 10,427 samples and benign 10,427 samples. Just unzip the files. We have tested the dataset for malware classification with Self-Supervised Learning

Funding Agency: 
Directorate General of Higher Education, Ministry of Education and Culture Republic of Indonesia; JASSO (Japanese Student Service Organization)