Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages

Name: Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages
Creator: Mohamad Amar Irsyad Mohd Aminuddin
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Security

Citation Author(s):: Mohamad Amar Irsyad Mohd Aminuddin (Universiti Sains Malaysia)

Zarul Fitri Zaaba (Universiti Sains Malaysia)
Submitted by:: Mohamad Amar Irsyad Mohd Aminuddin
Last updated:: Mon, 10/21/2024 - 18:57
DOI:: 10.21227/8drg-rn32

1240 views

Categories:

Security

Keywords:

Security; Privacy; Traffic Analysis; Website Fingerprinting

ACCESS DATASET CITE

Abstract

This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.

The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.

Each webpage URL is visited 90 times for each deskop and mobile browsing mode.

The captured network traffic are then extracted into Tor cell without SENDMEs removal. Each of the Tor cell file contains the network request and response traces with the relavant timestamp.

Instructions:

The files naming scheme are "X-Y.cell" where "X" is the webpage URL and "Y" is the instance number. Both desktop and mobile datasets has the same webpage URL to ensure comparable content.
Each files contains list of timestamp and cell directions for each webpage instance.

To read the file:
1. Choose the folder "desktop" or "mobile".
2. On the chosen folder, iterate each files.
3. Use the class number and the instance number from the file name to determine appropriate data ingestion process (e.g. feature selection or feature extraction).
4. On each file, iterate each lines to read the timestamp, cell size (in this case is 1), and cell direction.