Malware API Call Dataset

Malware API Call Dataset

Citation Author(s):
Ferhat Ozgur
Catak
Submitted by:
Ferhat Ozgur Catak
Last updated:
Tue, 07/30/2019 - 11:07
DOI:
10.21227/crfp-kd68
Data Format:
Links:
License:
Creative Commons Attribution
Dataset Views:
400
Share / Embed Cite

CATEGORIES

KEYWORDS

Our public malware dataset generated by Cuckoo Sandbox based on Windows OS API calls analysis for cyber security researchers.

Cite The DataSet If you find those results useful please cite them :

@misc{ mal-api-2019,
author = "Catak, FÖ. and Yazi, AF.",
title = "A Benchmark API Call Dataset for Windows PE Malware Classification",
year = "2019",
url = "https://arxiv.org/abs/1905.01999",
note = "[arXiv:1905.01999 ]"
}

Publications

The details of the Mal-API-2019 dataset are published in following the papers:

  • [Link] AF. Yazı, FÖ Çatak, E. Gül, Classification of Metamorphic Malware with Deep Learning (LSTM), IEEE Signal Processing and Applications Conference, 2019.
  • [Link] Catak, FÖ., Yazi, AF., A Benchmark API Call Dataset for Windows PE Malware Classification, arXiv:1905.01999, 2019.

 

Introduction

This study seeks to obtain data which will help to address machine learning based malware research gaps. The specific objective of this study is to build a benchmark dataset for Windows operating system API calls of various malware. This is the first study to undertake metamorphic malware to build sequential API calls. It is hoped that this research will contribute to a deeper understanding of how metamorphic malware change their behavior (i.e. API calls) by adding meaningless opcodes with their own dissembler/assembler parts.

 

Malware Types and System Overall

In our research, we have translated the families produced by each of the software into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus. Table 1 shows the number of malware belonging to malware families in our data set. As you can see in the table, the number of samples of other malware families except AdWare is quite close to each other. There is such a difference because we don't find too much of malware from the adware malware family.

 

Figure shows the general flow of the generation of the malware data set. As shown in the figure, we have obtained the MD5 hash values of the malware we collect from Github. We searched these hash values using the VirusTotal API, and we have obtained the families of these malicious software from the reports of 67 different antivirus software in VirusTotal. We have observed that the malicious software families found in the reports of these 67 different antivirus software in VirusTotal are different.

Malware FamilySamplesDescriptionSpyware832enables a user to obtain covert information about another's computer activities by transmitting data covertly from their hard drive.Downloader1001share the primary functionality of downloading content.Trojan1001misleads users of its true intent.Worms1001spreads copies of itself from computer to computer.Adware379hides on your device and serves you advertisements.Dropper891surreptitiously carries viruses, back doors and other malicious software so they can be executed on the compromised machine.Virus1001designed to spread from host to host and has the ability to replicate itself.Backdoor1001a technique in which a system security mechanism is bypassed undetectably to access a computer or its data. 

Instructions: 

Malware Types and System Overall

In our research, we have translated the families produced by each of the software into 8 main malware families: Trojan, Backdoor, Downloader, Worms, Spyware Adware, Dropper, Virus. Table 1 shows the number of malware belonging to malware families in our data set. As you can see in the table, the number of samples of other malware families except AdWare is quite close to each other. There is such a difference because we don't find too much of malware from the adware malware family.

Dataset Files

You must login with an IEEE Account to access these files. IEEE Accounts are FREE.

Sign Up now or login.

Embed this dataset on another website

Copy and paste the HTML code below to embed your dataset:

Share via email or social media

Click the buttons below:

facebooktwittermailshare
[1] , "Malware API Call Dataset", IEEE Dataport, 2019. [Online]. Available: http://dx.doi.org/10.21227/crfp-kd68. Accessed: Oct. 17, 2019.
@data{crfp-kd68-19,
doi = {10.21227/crfp-kd68},
url = {http://dx.doi.org/10.21227/crfp-kd68},
author = { },
publisher = {IEEE Dataport},
title = {Malware API Call Dataset},
year = {2019} }
TY - DATA
T1 - Malware API Call Dataset
AU -
PY - 2019
PB - IEEE Dataport
UR - 10.21227/crfp-kd68
ER -
. (2019). Malware API Call Dataset. IEEE Dataport. http://dx.doi.org/10.21227/crfp-kd68
, 2019. Malware API Call Dataset. Available at: http://dx.doi.org/10.21227/crfp-kd68.
. (2019). "Malware API Call Dataset." Web.
1. . Malware API Call Dataset [Internet]. IEEE Dataport; 2019. Available from : http://dx.doi.org/10.21227/crfp-kd68
. "Malware API Call Dataset." doi: 10.21227/crfp-kd68