Multivariate Time Series Characterization and Forecasting of VoIP Traffic in Real Mobile Networks

Citation Author(s):
Mario
Di Mauro
University of Salerno
Submitted by:
Mario Di Mauro
Last updated:
Sun, 07/16/2023 - 04:43
DOI:
10.21227/jef5-4w68
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Predicting the behavior of real-time traffic (e.g., VoIP) in mobility scenarios could help the operators to better plan their network infrastructures and to optimize the allocation of resources. Accordingly, we propose a forecasting analysis of crucial QoS/QoE descriptors (some of which neglected in the technical literature) of VoIP traffic in a real mobile environment. Please refer to our paper published on IEEE Transactions on Network and Service Management (https://ieeexplore.ieee.org/document/10184084) also available on ArXiv at: https://arxiv.org/pdf/2307.06645

We release an original real-world dataset used to perform the so-called "Multivariate time series prediction" possible both via statistical techniques (e.g. VAR) and Machine/Deep Learning (ML/DL) techniques. The dataset contains several features of cellular traffic organized into time series. The goal is to exploit statistical and learning-based techniques to predict the future behavior of a given feature. 

Instructions: 

The equipment we used to build the real-world dataset includes:

  • 1 cellular device equipped with Linphone (open-source softphone supporting RTCP-XR protocol) representing the User Equipment 1 (UE1);
  • 1 standard PC equipped with: i) Linphone softphone representing the User Equipment 2 (UE2), ii) the software probe Wireshark used to capture the network traffic between UE1 and UE2 and to save it in .pcap format.

The Dataset contains network traffic gathered in a real cellular environment around the city of Salerno (Italy) being classified as a medium-density city (around 2000 people/Km^2). Currently (Mar. 2023), such a territory is served by approximately 100 radio towers supporting a mix of LTE/LTE-Advanced (about 97%) and 5G-NSA (about 3%) technologies (data gathered from https://www.nperf.com/en/map/IT/).

We provide both 

- raw data (.pcap) available at: https://drive.google.com/file/d/1-r2Xd1VK6r7O_1KaXVYPus1Rcj6TF9DF/view?u...

- processed data (.txt) available at: https://github.com/mariodim/ml_mobile_dataset/blob/main/ML_TimeSeries_DA...

The whole dataset is split into 16 sub-datasets divided per codec and per network scenario:

  • 8 codecs: G.722, G.729, GSM, G.711, Mpeg4-16, OPUS, Speex-8, Speex-16.
  • 2 network scenarios:
    Mobile  (UE1 communicates with UE2 from a moving car at an average speed of 60 Km/H); 
    Fixed  (UE1 communicates with UE2 being fixed in a place).

Please note that, for space constraints, in our paper we analyze mobile scenario with a subset of codecs.

Each sub-dataset is the result of a post-processing stage on the raw .pcap files produced by Wireshark.
Each sub-dataset contains 6 temporal features organized in columns (the first column is the time reference):

  • MOS (Mean Opinion Score) --> it measures the call quality (expressed in a pure value between 1 and 5);
  • BW (Bandwidth) --> it measures the bandwidth consumed by a voice call ( expressed in kb/s);
  • RTT (Round Trip Time) --> it measures the interval between a sent and a received packet (expressed in ms);
  • JTR (Jitter) --> it measures the inter-packet jitter (expressed in ms);
  • DJB (De-jittering Buffer) --> it measures the buffer length used to reduce jitter (expressed in ms);
  • SNR (Signal-to-Noise ratio) --> it measures the objective quality of the communication channel (expressed in dB).

We have developed a Python routine that performs the multivariate time series prediction of features by using different techniques

Such a routine is available at the following link: https://colab.research.google.com/drive/1pe-p8yEP8QaVgWcOpVZ2ZwweJEqAjHh...

Please note that you have to upload a given sub-dataset in the same google Colab Notebook directory containing the routine.

After uploading a sub-dataset (e.g. mob_g722.txt, meaning that the traffic is collected within the mobile scenario and the codec used is G.722), set the parameters in the first "cell" of the Python code:

  • filename --> insert the name of the uploaded file (e.g. "mob_g722.txt")
  • methods --> you can choose one of the implemented techniques for time series prediction by setting True or False
  • param --> size of your ML network (number of dense neurons, number of units, epochs, etc.)
  • perc_train --> percentage of training size (the test size is set accordingly)
  • n_past --> number of past values used in the training set
  • n_fut --> number of future samples to be predicted (default = 1)

Output files include:

  • TXT files containing time series predictions per technique --> e.g. the output file mob_g722_cnn.txt is a 12-column file in this format: column #1 contains original values of MOS, column #2 contains predicted values of MOS, column #3 contains original values of BW, column #4 contains predicted values of BW, and so forth. Once exported, such files can be obviously used to reproduce the plots through different plot tools;
  • RMSE, MAE, MAPE values per each technique
  • Information about training time per each technique (directly shown in the output code). 

Comments

MULTIVARIATE TIME SERIES CHARACTERIZATION AND FORECASTING OF VOIP TRAFFIC IN REAL MOBILE NETWORKS

Submitted by Mario Di Mauro on Sat, 03/18/2023 - 17:12