TELCO

Citation Author(s):
Gastón
García González
Universidad de la República
Sergio
Martínez Tagliafico
Universidad de la República
Alicia
Fernández
Universidad de la República
Gabriel
Gómez
Universidad de la República
José
Acuña
Telefónica Uruguay
Pedro
Casas
AIT Austrian Institute of Technology
Submitted by:
Gaston Garcia G...
Last updated:
Wed, 08/02/2023 - 14:39
DOI:
10.21227/skpg-0539
Data Format:
Links:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

A recent study [1] alerts on the limitations of evaluating anomaly detection algorithms on popular time-series datasets such as Yahoo, Numenta, or NASA, among others. In particular, these datasets are noted to suffer from known flaws suchas trivial anomalies, unrealistic anomaly density, mislabeled ground truth, and run-to-failure bias. The TELCO dataset corresponds to twelve different time-series, with a temporal granularity of five minutes per sample, collected and manually labeled for a period of seven months between January 1 and July 31, 2021. This temporal length is seldom available in other publicly available datasets of this nature and is highly relevant and useful to allow for long-term seasonal behavior analysis. Each time-series corresponds to aggregated data from different sources; to keep business confidentiality, we do not specify the exact data type reflected by each time-series. The twelve time-series are typical data monitored in a mobile ISP, including the number and amount of prepaid data transfer fees, number and cost of calls, the volume of data traffic, number of SMS, and more.

[1]R. Wu and E. Keogh, “Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress,”IEEE Transactions on Knowledge and Data Engineering, pp. 1–1, 2021

Instructions: 

The dataset is available in .csv format. In each file, the first column corresponds to the timestamps, and the rest represents each of the univariate series within the TELCO multivariate series. The temporal granularity is set at five minutes per sample.
The files are separated into training (January - March), validation (April), test (May - July).

Comments

Soy Emilio, lo voy a necesitar para unas pruebas. Gracias!!

Submitted by Emilio Martinez on Wed, 12/13/2023 - 12:12

Nice dataset. May I download the dataset?

Submitted by Hengtao He on Tue, 12/26/2023 - 08:53