Abstract

The goal of our research is to identify malicious advertisement URLs and to apply adversarial attack on ensembles. We extract lexical and web-scrapped features from using python code. And then 4 machine learning algorithms are applied for the classification process and then used the K-Means clustering for the visual understanding. We check the vulnerability of the models by the adversarial examples. We applied Zeroth Order Optimization adversarial attack on the models and compute the attack accuracy.

Datasets are taken from different sources available on the internet. We have considered 12 different datasets which consist of 6 malicious and 6 benign URLs. The dataset includes about 3980870 URLs. We extracted the 89 lexical and web scrapped features for the further task.

Instructions:

The experiment setup for advertising URLs from 12 distinct datasets includes 3980870 URLs. There are two kinds of URLs in these contained in these datasets: benign and malicious. Furthermore, the malicious URL dataset includes four distinct sub-categories: spam, defacement, malware, and phishing. We also examined all of the URLs using the VirusTotal tool to confirm their authenticity.

Comments

Thank you for your datasets.

Submitted by Nguyen Khanh on Thu, 09/15/2022 - 03:39

I cannot download the dataset even after registering.

Submitted by ADETAYO ADETORO on Thu, 12/15/2022 - 14:47

qi liu

Submitted by qi liu on Fri, 09/08/2023 - 02:00

Dataset Files

URL Datasets URL Datasets.zip (4.09 MB)
Scripts Scripts.zip (38.55 kB)

Datasets

Standard Dataset

Pristine and Malicious URLs

Abstract

Comments

More from this Author

SPRITZ-PS: Validation of Synthetic Face Images Using a...

Dataset Files

QUESTIONS?