Abstract

Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user's computer and the secure network.
In this work we investigate these attacks under a different attack model, in which the adversary is capable of sending a small amount of malicious JavaScript code to the target user's computer. The malicious code mounts a cache side- channel attack, which exploits the effects of contention on the CPU's cache, to identify other websites being browsed. The effectiveness of this attack scenario has never been systematically analyzed, especially in the open-world model which assumes that the user is visiting a mix of both sensitive and non-sensitive sites.
We show that cache website fingerprinting attacks in JavaScript are highly feasible. Specifically, we use machine learning techniques to classify traces of cache activity. Unlike prior works, which try to identify cache conflicts, our work measures the overall occupancy of the last- level cache. We show that our approach achieves high classification accuracy in both the open-world and the closed- world models. We further show that our attack is more resistant than network-based fingerprinting to the effects of response caching, and that our techniques are resilient both to network-based defenses and to side-channel countermeasures introduced to modern browsers as a response to the Spectre attack. To protect against cache-based website fingerprinting, new defense mechanisms must be introduced to privacy-sensitive browsers and websites. We investigate one such mechanism, and show that generating artificial cache activity reduces the effectiveness of the attack and completely eliminates it when used in the Tor Browser.

Instructions:

Untar the data.
Every directory means different settings of the data collection:

Javascript collected data:

linux_chrome: the data was collected on linux OS using chrome browser.

linux_ff59:the data was collected on linux OS using firefox browser.

linux_tor:the data was collected on linux OS using Tor browser.

win_chrome: the data was collected on windows10 OS using chrome browser.

win_ff59:the data was collected on windows10 oS using firefox browser.

mac_safari: the data was collected on Mac machine using safari browser.

linux_tor_counter :the data was collected on linux OS using Tor browser while running countermeasures.

CW: closed worls detting , every website appear several times.

OW: open worls setting, every website appear only one (or very few) times.

Use the following online colab script to run the test set on the classifiers;

https://colab.research.google.com/drive/1UgbqVvoEjpG61c1CYbKdjZz8kQKHGlgU

Dataset Files

cacheprints.tar (140.23 GB)

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.