Quynh Trinh, tvquynh@gmail.com

This dataset is part of my Master's research on malware detection and classification using the XGBoost library on Nvidia GPU. The dataset is a collection of 1.55 million of 1000 API import features extract from jsonl format of the EMBER dataset 2017 v2 and 2018. All data is pre-processing, duplicated records are removed. The dataset contains 800,000 malware and 750,000 "goodware" samples.

Dataset Files

You must be an IEEE Dataport Subscriber to access these files. Subscribe now or login.

[1] Quynh Trinh, "1.55M API IMPORT DATASET for MALWARE ANALYSIS", IEEE Dataport, 2021. [Online]. Available: http://dx.doi.org/10.21227/98jc-y909. Accessed: Jan. 15, 2025.
@data{98jc-y909-21,
doi = {10.21227/98jc-y909},
url = {http://dx.doi.org/10.21227/98jc-y909},
author = {Quynh Trinh },
publisher = {IEEE Dataport},
title = {1.55M API IMPORT DATASET for MALWARE ANALYSIS},
year = {2021} }
TY - DATA
T1 - 1.55M API IMPORT DATASET for MALWARE ANALYSIS
AU - Quynh Trinh
PY - 2021
PB - IEEE Dataport
UR - 10.21227/98jc-y909
ER -
Quynh Trinh. (2021). 1.55M API IMPORT DATASET for MALWARE ANALYSIS. IEEE Dataport. http://dx.doi.org/10.21227/98jc-y909
Quynh Trinh, 2021. 1.55M API IMPORT DATASET for MALWARE ANALYSIS. Available at: http://dx.doi.org/10.21227/98jc-y909.
Quynh Trinh. (2021). "1.55M API IMPORT DATASET for MALWARE ANALYSIS." Web.
1. Quynh Trinh. 1.55M API IMPORT DATASET for MALWARE ANALYSIS [Internet]. IEEE Dataport; 2021. Available from : http://dx.doi.org/10.21227/98jc-y909
Quynh Trinh. "1.55M API IMPORT DATASET for MALWARE ANALYSIS." doi: 10.21227/98jc-y909