Datasets
Standard Dataset
malicious and benign websites
- Citation Author(s):
- Christian Urcuqui, Andrés Navarro, José Osorio, Melisa García
- Submitted by:
- Christian Urcuqui
- Last updated:
- Thu, 11/08/2018 - 10:34
- DOI:
- 10.21227/H26Q1T
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
Abstract
One important topic to work is to create a good set of malicious web characteristics, because it is difficult to find one updated and with a research work to support it .
This dataset is a another research production of my bachelor students, this is a result of a project that consisted to evaluate classification models to predict malicious and benign websites through their application layer and network characteristics. The data were obtained by a process that included different sources of benign and malicious URL, all of them were verified and used in a low interactive client honeypot in order to get their network traffic, furthermore, we used some tools to get other more information, such as the server country with Whois.
This is the first version, but, we have some results of the application of machine learning classifiers in a bachelor thesis and in an article, so, all the data process making and the data description are in above works. But, maybe in the next days I will provide a resume of these in this page.
If your papers or other works use our dataset, please cite our paper as follows. Urcuqui, C., Navarro, A., Osorio, J., & Garcıa, M. (2017). Machine Learning Classifiers to Detect Malicious Websites. CEUR Workshop Proceedings. Vol 1950, 14-17.
If you need an article of the websites cybersecurity state of the art, you can find it in english and spanish: Urcuqui, C., Peña, M. G., Quintero, J. L. O., & Cadavid, A. N. (2017). Antidefacement. Sistemas & Telemática, 14(39), 9-27.
If you have any question or feedback, please do not dude to write at the next email: