UGR'16 Tensor Time-Series Dataset

Citation Author(s):
Mehdi
Shajari
University of Toronto
Hongxiang
Geng
University of Toronto
Kaixuan
Hu
University of Toronto
Alberto
Leon-Garcia
University of Toronto
Submitted by:
Hongxiang Geng
Last updated:
Thu, 04/07/2022 - 22:17
DOI:
10.21227/ma99-6j85
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset is used for network anomaly detection and is based on the UGR16 dataset network traffic flows. We used June week 2 to 4 tensors generated from raw flow data to train the models. The dataset includes a set of tensors generated from the whole UGR’16 network traffic (general tensor data) and several sets of port tensors (for specific port numbers). It also includes the trained models for each type of tensor. The tensors extracted from network traffic in the period from July week 5 to the end of August can be used for evaluation. The naming convention is as follows:

  • Model: {week period}_{weekend or weekday}_{morning or evening}_{detection period}_{optional_port}_model
  • Evaluation Tensors: {week number}_tensors_all_{detection period}_{optional port}
  • Training Tensors: {week number}_tensors_{detection period}_{optional port}

After unzipping the model ZGIP file, it should contain a best_model.pt which can be loaded using Pytorch using the model structure defined in the model.py. The difference between training and evaluation tensors is that the training tensors only contain the traffic flows that are labelled as background while the evaluation tensors contain traffic labelled as background as well as different simulated attacks. For the port tensors, they use a different third dimension than the general tensor.

All the tensors in the dataset provided here are 3D 64 by 64 by 64 tensors. The dimensions for general tensors are source IP, destination IP, and destination port. The meaning of each index is defined at the beginning in util.py. The only thing that changed in the port tensors is the meaning of the third dimension which has changed from the destination port to the number of bytes. Similar to the general tensor, the indices of different dimensions of the port tensors are defined in util.py.

UGR'16 dataset is available at:

https://nesg.ugr.es/nesg-ugr16/

For the description of the UGR’16 dataset please see the following paper:

Gabriel Maciá Fernández, José Camacho, Roberto Magán-Carrión, Pedro García-Teodoro, Roberto Theron, Ugr'16: a new dataset for the evaluation of cyclostationarity-based network IDSs, In Computers & Security, 2017

Instructions: 

One GZIP file corresponds to one specific type of tensors used to either train or evaluates the model

Dataset Files

LOGIN TO ACCESS DATASET FILES