Datasets
Open Access
Botnet DGA Dataset
- Citation Author(s):
- Submitted by:
- hatma suryotrisongko
- Last updated:
- Tue, 06/29/2021 - 23:19
- DOI:
- 10.21227/rg6z-z622
- Data Format:
- Links:
- License:
Abstract
This is the dataset used in our journal paper, submitted to: IEEE Transactions on Dependable and Secure Computing, Special Issue on Explainable Artificial Intelligence for Cyber Threat Intelligence (XAI-CTI) Applications.
Submitted = 12-Jan-2021 (under review)
ID = TDSCSI-2021-01-0045
Source codes are available at IEEE Code Ocean:
Hatma Suryotrisongko (2020) Botnet DGA [Source Code]. https://doi.org/10.24433/CO.4005597.v2
This is the dataset used in our journal paper, submitted to: IEEE Transactions on Dependable and Secure Computing, Special Issue on Explainable Artificial Intelligence for Cyber Threat Intelligence (XAI-CTI) Applications.
Submitted = 12-Jan-2021 (under review)
ID = TDSCSI-2021-01-0045
Size: 205 MB
Platform: all platform (csv file)
Environment: all environment (csv file)
Major component description: We adopted Patsakis’ approach on the dataset used in [82], using Alexa top 1M domains and 10 botnet DGAs (total 1.803.333 domain names) published by Abakumov as the ground truth dataset for botnet detection ( https://github.com/andrewaeva/DGA). This dataset contains columns: a CharLength, TreeNewFeature, nGramReputation_Alexa, REAlexa, MinREBotnets, Entropy and InformationRadius. Please read our paper to understand how we calculate it.
Detailed setup instructions: Copy the csv file and use it as your need.
Detailed run instructions: Please note that the columns are = Entropy,REAlexa,MinREBotnets,InformationRadius,CharLength,TreeNewFeature,nGramReputation_Alexa,Class
Output description: Botnet DGA dataset
Contact information: Hatma Suryotrisongko
Dataset Files
LOGIN TO ACCESS DATASET FILESOpen Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
README.txt | 1.15 KB |
Comments
It'd be better if you include the domains in the dataset.