YouTube, Netflix, Web dataset for Encrypted Traffic Classification
The dataset is oriented on encrypted traffic classification problems. The dataset contains three classes of flows: web flows, YouTube flows, and Netflixflows. These classes are chosen because web and video traffic account for 90% of global traffic, while YouTube and Netflix are the largest video services. The structure of the dataset is as follows. It includes 100 download traces of the most popular web pages according to https://httparchive.org, 100 the most popular YouTube videos, and 50 Netflix series and movies. To improve the diversity of the data, we collect them on a PC running Ubuntu 18.04, Windows 10, or Mac OS X with Chromium-browser, Google Chrome, or Safari, respectively. A single capture file contains a single web page/video download trace for a particular OS and browser.
Each * .pcap file contains the download trace of the isolated traffic of a particular web service. Each filename contains the name of the web service (e.g. youtube.com, netflix.com, facebook.com) in which traffic is contained in the file, the IP address of the device from which the traffic was collected, and some unique number. The unique numbers are necessary to the files with the download trace of the particular web service can be distinguished from each other. For the task of traffic classification, each file is proposed to be divided into flows. This can be done using an open-source tool like Tshark.