YouTube, Netflix, Web dataset for Encrypted Traffic Classification

Citation Author(s):
Danil
Shamsimukhametov
IITP RAS, Wireless Networks Lab
Mikhail
Liubogoshchev
IITP RAS, Wireless Networks Lab
Evgeny
Khorov
IITP RAS, Wireless Networks Lab
Ian F.
Akyildiz
IITP RAS, Wireless Networks Lab
Submitted by:
Danil Shamsimuk...
Last updated:
Fri, 10/01/2021 - 09:39
DOI:
10.21227/s7x7-wd58
Data Format:
License:
158 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

The dataset is oriented on encrypted traffic classification problems. The dataset contains three classes of flows: web flows, YouTube flows, and Netflixflows. These classes are chosen because web and video traffic account for 90% of global traffic, while YouTube and Netflix are the largest video services. The structure of the dataset is as follows. It includes 100 download traces of the most popular web pages according to https://httparchive.org, 100 the most popular YouTube videos, and 50 Netflix series and movies. To improve the diversity of the data, we collect them on a PC running Ubuntu 18.04, Windows 10, or Mac OS X with Chromium-browser, Google Chrome, or Safari, respectively. A single capture file contains a single web page/video download trace for a particular OS and browser.

Instructions: 

Each * .pcap file contains the download trace of the isolated traffic of a particular web service. Each filename contains the name of the web service (e.g. youtube.com, netflix.com, facebook.com) in which traffic is contained in the file, the IP address of the device from which the traffic was collected, and some unique number. The unique numbers are necessary to the files with the download trace of the particular web service can be distinguished from each other. For the task of traffic classification, each file is proposed to be divided into flows. This can be done using an open-source tool like Tshark.

Comments

test

Submitted by Igor Paredes on Fri, 10/08/2021 - 12:53