This repository contains the results of 30 public Internet browsing experiments, from a computer at the campus network of the Public University of Navarre, out of which 20 used plaintext HTTP browsing, while 10 used HTTPS. We present both the original data sources in the form of network packet traces and HAR waterfalls, as well as the processed results formatted as line-based text files.
Each experiment consisted of a Selenium-automated web browser (Google Chrome 80.0) visiting a set of predefined web sites, with all caching options disabled. Both network packet traffic traces and in-browser measurements were collected. The network measurements were collected using tcpdump running at the client, while in-browser measurements were collected through the HAR Export Trigger extension. We have uploaded both sets of files.
The sets of websites for HTTP and HTTPS experiments are different, as modern web sites usually support HTTPS but not HTTP. The HTTPS set was obtained by collecting the top 2000 web sites from the Alexa Top Ranking. The HTTP set is the subset of these top 2000 websites, those that supported plain-text HTTP. To extend the amount of measurements of plain HTTP traffic, each of these websites was crawled, following the embedded ‘http://’ links.
For each web resource requested by the browser, we computed the time elapsed between the HTTP request being sent and the response being fully received; this is referred to as the resource's response time. Each response time obtained, along with the URL for that resource, and the timestamp at which the request was made, is referred to as a sample. These samples are obtained from the browser measurements and from network traffic. For the HTTPS experiments, the network data was decrypted using the ephemeral per-session encryption keys generated by the web browser. The files containing these keys have also been uploaded.
A number of resources are requested more than once during each test, such as cascade style sheets or images. Although we deactivated the cache, the browser still sometimes reported some resources as requested with a false response time of zero, since the request is never issued to the server but obtained from a cache. Also, a small number of requests trigger an exception in the browser, which prevents data being collected at the client side, although the request and response are present in the network traffic. These behaviours complicate one-to-one comparisons between network and in-browser measurements because a different number of response times for a specific resource may be found in the network traffic and in the browser report. We exported to text files only the first response time seen for each resource with a unique URL. This filtering removes false measurements reported by the browser. In case this filtering is not desired, all the data can be obtained from the pcap and HAR files uploaded.
The dataset contains the original PCAP and HAR files, and also the post-processed files obtained from them. The raw data is contained in the raw_http.zip and raw_https.zip files, while the post-processed files are contained in the data.zip file. Inside the data.zip archive there are two directories, corresponding to the HTTP and HTTPS experiments respectively.
Both raw data archives contain files named X.pcap and subdirectories named X_har (with X being the name of each individual experiment), corresponding to the data gathered from network traces and in-browser measurements respectively. Inside each X_har directory, a .har file is stored for each visited site with the full download waterfall. Additionally, decryption keys for the HTTPS experiments are provided, under the name of X.key.
The data.zip archive contains three files for each experiment, amounting to a total of 60 and 30 files for HTTP and HTTPS respectively.
The three files describing each experiment contain line-based text data, and are named X_network_tresp.txt, X_browser_tresp.txt and X_conn_info.txt, with X being the name of each individual experiment. The first two files contain, on each line, space-separated fields describing a single request-response sample. X_network_tresp.txt contains the information gathered from network traces, while X_browser_tresp.txt was obtained from browser instrumentation. On the other hand, X_conn_info.txt contains, on each line, space-separated fields related to each TCP connection present during the experiment, obtained through network traces.
The connections in X_conn_info.txt and the samples in X_network_tresp.txt are associated through a unique connection ID field present in each line in both files. Note that this is a one-to-many relationship, meaning that a connection ID is associated to a single TCP stream (i.e. line in X_conn_info.txt), but one or more samples (i.e. lines in X_network_tresp.txt).
We describe below the line format for each file. This information is included as well in the "format.txt" file, located on the top level directory of the compressed archive.
Number of retransmissions
Number of sequence holes
Number of data packets, client to server
Number of data packets, server to client
Request timestamp (seconds)
Response time (seconds)
Response size (bytes)
Request timestamp (seconds)
Response time (seconds)
This repository contains the results of running more than 70 samples of ransomware, from different families, dating since 2015. It contains the network traffic (DNS and TCP) and the Input/Output (I/O) operations generated by the malware while encrypting a network shared directory. These data are contained in three files for each ransomware sample: one with the information from the DNS requests, other with the TCP connections another one containing the I/O operations. This information can be useful for testing new and old ransomware detection tools and compare their results.
The dataset is organised as one zip file for all text files organised in one directory for each ransomware sample. Although another zip file could be uploaded with all the trace files organised in the same manner as the previous zip file, it was extremely large file (more than 650GB after compression). In order to make the download easier, we have uploaded the trace files in separated zip files, one for each directory or scenario. We have also published in an external website (link) the trace files available to download individually. If a single trace file download is desired, we recommend to visit the website and download it.
For each malware sample three text files are generated (dnsInfo.txt, TCPconnInfo.txt and IOops.txt) and placed in the directory with the ransomware strain’s name. Structure of all the directory and subdirectories are shown in README.pdf file and in the text file “repositoryStructure.txt”.
The I/O operations file contains one text line for each operation (open or close file, read, write, rename, delete, etc). Each line contains fields separated by the blank space character (ASCII 0x20), with the useful metadata about the operation (file name, read and write offset and length, timestamp, etc). The file README.pdf explains all the fields in the I/O operations file.
The DNS info file has one line per each DNS request made by the user machine. The DNS server is ‘220.127.116.11’ for all traces. The file README.pdf explains each column. The TCP info file has one line per each TCP connection. In case the connection contains a HTTP request, the method, response code and url are present in this file. As in previous cases, in the README file the columns and structure of file is explained.
We started downloading ransomware samples in 2015 from hybrid-analysis.com and malware-traffic-analysis.com. The samples were executed in one machine and the DNS and HTTP petitions were collected by a traffic probe mirroring the traffic. The ‘infected’ machine has a mounted directory, shared by a server. The content of this directory is encrypted by the ransomware during its activity. The operations over this directory were captured by the same traffic probe and processed with specialised software to extract the I/O operations in the format explained in the README.pdf file.
In order to analyse the ransomware behaviour, we made different shared directories and we ran some samples in both directories. These shared directories follow an statistic distribution for the file sizes and location of each one, trying to simulate users’ fileset. Changing the seed in the generation of the directory we can make similar directories with different number of files, distribution and subdirectories. The trace files of ransomwares run in this cases can be found in zip files named ‘5GvXdirectory.zip’ where X goes from 2 to 10. We have also run samples with shared directory of 10GB size, which trace files are placed in zip file called ‘10Gdirectory’.
We have also run one sample sweeping the network speed for simulating ransomware encrypting the files slowly. These traces can be found in ‘networkSpeed.zip’ file. Finally, the samples run in scenario with Windows 10 user and server generated traffic traces placed in the file ‘W10scenario.zip’. There is not text files for these samples as the traffic is encrypted in the version 3 of SMB protocol (used in Windows 10 machines).
As we have explained above, the traces files can be downloaded individually from an external link but the text files associated to them are placed in a single zip file (it is possible to download them all together due to its smaller size).
Desktops and laptops can be maliciously exploited to violate privacy. In this paper, we consider the daily battle between the passive attacker who is targeting a specific user against a user that may be adversarial opponent. In this scenario, while the attacker tries to choose the best vector attack by surreptitiously monitoring the victim’s encrypted network traffic in order to identify user’s parameters such as the Operating System (OS), browser and apps. The user may use tools such as a Virtual Private Network (VPN) or even change protocols parameters to protect his/her privacy.