Pristine and Malicious URLs

Citation Author(s):
Ehsan
Nowroozi
Queen's University Belfast
Abhishek
-
MNNIT Allahabad, India
Mohammadreza
Mohammadi
University of Padua, Italy
Mauro
Conti
University of Padua, Italy
Submitted by:
Ehsan Nowroozi
Last updated:
Mon, 11/06/2023 - 09:43
DOI:
10.21227/2ph5-xc09
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The goal of our research is to identify malicious advertisement URLs and to apply adversarial attack on ensembles. We extract lexical and web-scrapped features from using python code. And then 4 machine learning algorithms are applied for the classification process and then used the K-Means clustering for the visual understanding. We check the vulnerability of the models by the adversarial examples. We applied Zeroth Order Optimization adversarial attack on the models and compute the attack accuracy.

Datasets are taken from different sources available on the internet. We have considered 12 different datasets which consist of 6 malicious and 6 benign URLs. The dataset includes about 3980870 URLs. We extracted the 89 lexical and web scrapped features for the further task.

 

Instructions: 

 

The experiment setup for advertising URLs from 12 distinct datasets includes 3980870 URLs. There are two kinds of URLs in these contained in these datasets: benign and malicious. Furthermore, the malicious URL dataset includes four distinct sub-categories: spam, defacement, malware, and phishing. We also examined all of the URLs using the VirusTotal tool to confirm their authenticity. 

Comments

Thank you for your datasets.

Submitted by Nguyen Khanh on Thu, 09/15/2022 - 03:39

I cannot download the dataset even after registering.

Submitted by ADETAYO ADETORO on Thu, 12/15/2022 - 14:47

qi liu

Submitted by qi liu on Fri, 09/08/2023 - 02:00