DoQ+QUIC web traffic dataset

Citation Author(s):
Levente
Csikor
Institute for Infocomm Research (I2R), A*STAR
Submitted by:
Levente Csikor
Last updated:
Tue, 12/03/2024 - 04:43
DOI:
10.21227/km5h-g294
Data Format:
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Moving away from plain-text DNS communications,
users now have the option of using encrypted DNS protocols
for domain name resolutions. DNS-over-QUIC (DoQ) employs
QUIC—the latest transport protocol—for encrypted communi-
cations between users and their recursive DNS servers. QUIC is
also poised to become the foundation of our daily web browsing
experience by replacing TCP with HTTPP/3, the latest version
of the HTTP protocol.
Traditional TCP-based web browsing is vulnerable to website
fingerprinting (WFP) attacks that can identify the websites a user
visits. The emergence of QUIC-based DNS and HTTP protocols
raises an important question: are regular users better protected
from WFP attacks when using these new protocols?
To investigate this, we first collect and publicly release the
first benchmark dataset of network traffic corresponding to real
visits to QUIC-enabled websites while using DoQ for domain
resolution. This dataset will help advance the research on WFP
attacks and defenses. Second, we implement and evaluate the
first WFP attack targeting the combined use of DoQ and HTTP/3
protocols by users by developing two transformer models tailored
for WFP attacks. Finally, we conduct comprehensive experiments,
which reveal that these models are effective in identifying user-
visited websites, emphasizing the need for defensive measures.

Instructions: 

The zipped archive contains further zipped archives corresponding to data collected at different vantage points provided by Cloudlab, e.g., Wisconsin, clemson, utah.

Several network trace instances (website visits and their corresponding dataset) can be found within those zipped archives, each identified by the actual domain names as directories. Within the directories, the CSV files can be found representing packet-level-based capture with the following columns:

protocol;length;relative_time;direction;src_ip;src_port;dst_ip;dst_port

protocol: 1 if DoQ, 0 otherwise

length: packet size

relative_time: inter-arrival time of the packets

direction: 0 incoming, 1 outgoing

the rest are from the raw packet-header metadata (src IP,port,dst IP,port)