Malware Analysis Datasets: Top-1000 PE Imports

Citation Author(s):
Angelo
Oliveira
Submitted by:
Angelo Oliveira
Last updated:
Fri, 11/08/2019 - 05:43
DOI:
10.21227/004e-v304
Data Format:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset is part of my PhD research on malware detection and classification using Deep Learning. It contains static analysis data: Top-1000 imported functions extracted from the 'pe_imports' elements of Cuckoo Sandbox reports. PE malware examples were downloaded from virusshare.com. PE goodware examples were downloaded from portableapps.com and from Windows 7 x86 directories.

Instructions: 

* FEATURES *

Column name: hash
Description: MD5 hash of the example
Type: 32 bytes string

Column name: GetProcAddress
Description: Most imported function (1st)
Type: 0 (Not imported) or 1 (Imported)

...

Column name: LookupAccountSidW
Description: Least imported function (1000th)
Type: 0 (Not imported) or 1 (Imported)

Column name: malware
Description: Class
Type: 0 (Goodware) or 1 (Malware)

* ACKNOWLEDGMENTS *

We would like to thank: Cuckoo Sandbox for developing such an amazing dynamic analysis environment!
VirusShare! Because sharing is caring!
Universidade Nove de Julho for supporting this research.
Coordination for the Improvement of Higher Education Personnel (CAPES) for supporting this research.

* CITATIONS *

Please refer to the dataset DOI.
Please feel free to contact me for any further information.

Comments

Greetings, I would like to get more information about the citations of this dataset.

Submitted by Noran Abu Shaib on Mon, 01/27/2020 - 11:47

Very interesting dataset. Do you have a descriptive file of the dataset. precisely what the features mean?

Submitted by zakaria sadelaoud on Sun, 03/08/2020 - 16:44

Appreciated, Please share the Feature Explanantion File or Description. Dataset is useless untill features are explained explecitly 

Submitted by Muhammad Hanif on Mon, 04/20/2020 - 13:19

Hi Do you have the Description of those features?

Appreciated if you share it.

 

Thanks & regards.

Submitted by Saba iqbal on Wed, 07/29/2020 - 06:18

Please share the description of all the 1000 odd feature set. Appreciate if you can list the top 10/20 most influential features. thanks

Submitted by solomon raj on Sat, 05/23/2020 - 12:56

Hello
Dear friend, it is possible to send additional descriptions of the features
Thanks

Submitted by Ali Kardani on Mon, 10/19/2020 - 09:40

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.