Datasets
Standard Dataset
DoQ+QUIC web traffic dataset
- Citation Author(s):
- Submitted by:
- Levente Csikor
- Last updated:
- Tue, 12/03/2024 - 04:43
- DOI:
- 10.21227/km5h-g294
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
Moving away from plain-text DNS communications,
users now have the option of using encrypted DNS protocols
for domain name resolutions. DNS-over-QUIC (DoQ) employs
QUIC—the latest transport protocol—for encrypted communi-
cations between users and their recursive DNS servers. QUIC is
also poised to become the foundation of our daily web browsing
experience by replacing TCP with HTTPP/3, the latest version
of the HTTP protocol.
Traditional TCP-based web browsing is vulnerable to website
fingerprinting (WFP) attacks that can identify the websites a user
visits. The emergence of QUIC-based DNS and HTTP protocols
raises an important question: are regular users better protected
from WFP attacks when using these new protocols?
To investigate this, we first collect and publicly release the
first benchmark dataset of network traffic corresponding to real
visits to QUIC-enabled websites while using DoQ for domain
resolution. This dataset will help advance the research on WFP
attacks and defenses. Second, we implement and evaluate the
first WFP attack targeting the combined use of DoQ and HTTP/3
protocols by users by developing two transformer models tailored
for WFP attacks. Finally, we conduct comprehensive experiments,
which reveal that these models are effective in identifying user-
visited websites, emphasizing the need for defensive measures.
The zipped archive contains further zipped archives corresponding to data collected at different vantage points provided by Cloudlab, e.g., Wisconsin, clemson, utah.
Several network trace instances (website visits and their corresponding dataset) can be found within those zipped archives, each identified by the actual domain names as directories. Within the directories, the CSV files can be found representing packet-level-based capture with the following columns:
protocol;length;relative_time;direction;src_ip;src_port;dst_ip;dst_port
protocol: 1 if DoQ, 0 otherwise
length: packet size
relative_time: inter-arrival time of the packets
direction: 0 incoming, 1 outgoing
the rest are from the raw packet-header metadata (src IP,port,dst IP,port)