Website Fingerprinting Dataset of Browsing Network Traffic for Desktop and Mobile Webpages

Citation Author(s):
Mohamad Amar Irsyad
Mohd Aminuddin
Universiti Sains Malaysia
Zarul Fitri
Zaaba
Universiti Sains Malaysia
Submitted by:
Mohamad Amar Ir...
Last updated:
Mon, 10/21/2024 - 14:57
DOI:
10.21227/8drg-rn32
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This is a dataset of Tor cell file extracted from browsing simulation using Tor Browser. The simulations cover both desktop and mobile webpages. The data collection process was using WFP-Collector tool (https://github.com/irsyadpage/WFP-Collector). All the neccessary configuration to perform the simulation as detailed in the tool repository.

The webpage URL is selected by using the first 100 website based on: https://dataforseo.com/free-seo-stats/top-1000-websites.

Each webpage URL is visited 90 times for each deskop and mobile browsing mode.

The captured network traffic are then extracted into Tor cell without SENDMEs removal. Each of the Tor cell file contains the network request and response traces with the relavant timestamp.

Instructions: 

The files naming scheme are "X-Y.cell" where "X" is the webpage URL and "Y" is the instance number. Both desktop and mobile datasets has the same webpage URL to ensure comparable content.
Each files contains list of timestamp and cell directions for each webpage instance.

To read the file:
1. Choose the folder "desktop" or "mobile".
2. On the chosen folder, iterate each files.
3. Use the class number and the instance number from the file name to determine appropriate data ingestion process (e.g. feature selection or feature extraction).
4. On each file, iterate each lines to read the timestamp, cell size (in this case is 1), and cell direction.

Comments

  

Submitted by Blerim Rexha on Mon, 10/28/2024 - 05:53