DeepCoAST Dataset

Citation Author(s):: Goun Kim (Ewha Womans University)

HyeonJeong Kwak (Ewha Womans University)

Sujin Kim (Ewha Womans University)

Se Eun Oh (Ewha Womans University)
Submitted by:: Goun Kim
Last updated:: Tue, 07/16/2024 - 20:53
DOI:: 10.21227/9chd-ng79
Data Format:: pickle

123 views

Categories:

Keywords:

Anonymity

Flow Correlation Attack

Tor

Website Fingerprinting

ACCESS DATASET CITE

Abstract

Our DeepCoAST dataset specifically explores the vulnerabilities of various traffic-splitting Website Fingerprinting (WF) Defenses, such as TrafficSliver, HyWF, and CoMPS. Our dataset comprises defended traces generated from the BigEnough dataset, which includes Tor cell trace instances of 95 websites, each represented by 200 instances collected under the standard browser security level. We simulated the traffic-splitting defenses assuming there are two split traces from the vanilla trace. Additionally, we set four configurations for TrafficSliver, adjusting the number of Tor cells and path selection probabilities to simulate realistic network conditions. We set one configuration for HyWF and CoMPS, and all configurations follow the parameters from the papers. We created the train and test dataset by extracting the various features such as Direction, Tik-Tok, ICD, ICDS, and 1-D TAM mentioned in our paper. Our dataset offers a comprehensive basis for evaluating traffic-splitting WF defenses.

Instructions:

1. Overview: This dataset is designed for research in traffic-splitting Website Fingerprinting(WF) defenses on Tor. It includes defended traces for three defense methods such as TrafficSliver, HyWF, and CoMPS.

2. Dataset Structure: The dataset is structured into three main directories, one for each defense mechanism. The TrafficSliver folder is divided into two folders, the number of Tor cells configuration, and each folder is divided into a path selection probabilities folder. Each directory is structured into five feature sub-directories, containing split trace pairs (path0, path1), with each pair representing the defended trace.

3. Filtering Process: Traces with fewer than 10 cells were removed to ensure data quality. Users should consider this filtering step when designing experiments or analyzing the dataset.

4. Usage Recommendation: We recommend using this dataset to evaluate the effectiveness of WF defense strategies, perform comparative analyses with other datasets, or develop new defense mechanisms.

5. Citing the Dataset: When using this dataset in your research, please cite the source of the BigEnough dataset and our paper.

6. Feedback and Contributions: We welcome feedback and contributions from the research community.

7. Acknowledgement: This work was partly supported by the Ewha Womans University Research Grant of 2022, the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. RS-2023-00222385, RS-2022-00166669) and Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT)(No.RS-2022-00155966, Artificial Intelligence Convergence Innovation Human Resources Development (Ewha Womans University)).

Funding Agency

Korea government(MSIT), Ewha Womans University

Grant Number

RS-2023-00222385, RS-2022-00166669, RS-2022-00155966, Ewha Womans University Research Grant of 2022