Datasets
Standard Dataset
5G Traffic Datasets
- Citation Author(s):
- Submitted by:
- Yong-Hoon Choi
- Last updated:
- Mon, 10/02/2023 - 23:17
- DOI:
- 10.21227/ewhk-n061
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
We created a 5G dataset by measuring 5G traffic directly from a major mobile operator in South Korea. The model name of the mobile terminal used for traffic measurement is the Samsung Galaxy A90 5G, equipped with a Qualcomm Snapdragon X50 5G modem. We installed PCAPdroid, a packet sniffer software, on the terminal via Google Play. Traffic was measured sequentially per application on two stationary terminals (only one terminal is used for noninteractive services) with no background traffic. The dataset contains various types of traffic, and you can find them listed in the table below. The collected dataset includes resource-intensive video traffic that has the greatest impact on 5G network planning and provisioning. We did not mix background traffic to measure the unique characteristics of each type of traffic.
The video streaming dataset contains data directly measured while watching Netflix and Amazon Prime Video, representative over-the-top (OTT) services, on mobile devices. The live streaming dataset is measured while watching YouTube Live and South Korea's famous live broadcasts (Naver NOW and Afreeca TV). Video conferencing data are measured by conducting live meetings on the popular Zoom, MS Teams, and Google Meet platforms. Two types of metaverse traffic are acquired: Zepeto and Roblox. Zepeto traffic is collected while staying in 'Camping' for 15 hours. Roblox traffic is collected by playing 'Collect All Pets' for 25 hours using the auto-clicker. We collect two types of mobile network gaming traffic. The first is cloud gaming, an online game setup that runs video games on remote servers and streams them directly to the user's device. The second is a typical mobile game connected to the Internet.
The dataset was collected from May to October 2022, has a total length of 328 hours, and is provided in CSV file format. The dataset is a timestamp-mapped time-series dataset with packet header information, and further traffic analysis by application is possible because it includes source and destination addresses.
All files have been converted and saved in CSV format, making them easily accessible for machine learning. The detailed composition of the dataset is presented in the table below:
(Note: The machine learning model that generates 5G traffic by training on this dataset is available on IEEE Code Ocean. Please visit IEEE Code Ocean at ML-Based 5G Traffic Generation for Practical Simulations Using Open Datasets | Code Ocean.)
Type | Application | Protocol | Duration and Size |
---|---|---|---|
Live Streaming | YouTube Live | GQUIC | 20h 19m 38s File size: 0.73GB |
AfreecaTV | TCP | 20h 14m 00s File size: 4.06GB | |
Naver NOW | TCP | 33h 50m 34s File size: 12.48GB | |
Stored Streaming | YouTube | QUIC | 22h 59m 51s File size: 1.12GB |
Netflix | TCP | 24h 43m 02s File size: 0.74GB | |
Amazon Prime Video | TCP | 32h 39m 10s File size: 1.54GB | |
Video Conferencing | Zoom | UDP | 26h 12m 53s File size: 3.36GB |
MS Teams | UDP | 28h 17m 27s File size: 3.71GB | |
Google Meet | UDP | 24h 01m 40s File size: 4.41GB | |
Metaverse | Zepeto | TCP | 15h 28m 36s File size: 0.16GB |
Roblox | RakNet | 25h 04m 11s File size: 0.11GB | |
Online Game | Teamfight Tactics | UDP | 13h 46m 53s File size: 0.24GB |
Battleground | UDP | 16h 02m 57s File size: 0.38GB | |
Game Streaming | GeForce Now | UDP | 12h 26m 21s File size: 7.05GB |
KT GameBox | UDP | 12h 23m 26s File size: 4.36GB |
Comments
I am working on AI model which need data
I work on data analysis project, so I need this data