A recent study  alerts on the limitations of evaluating anomaly detection algorithms on popular time-series datasets such as Yahoo, Numenta, or NASA, among others. In particular, these datasets are noted to suffer from known flaws suchas trivial anomalies, unrealistic anomaly density, mislabeled ground truth, and run-to-failure bias. The TELCO dataset corresponds to twelve different time-series, with a temporal granularity of five minutes per sample, collected and manually labeled for a period of seven months between January 1 and July 31, 2021. This temporal length is seldom available in other publicly available datasets of this nature and is highly relevant and useful to allow for long-term seasonal behavior analysis. Each time-series corresponds to aggregated data from different sources; to keep business confidentiality, we do not specify the exact data type reflected by each time-series. The twelve time-series are typical data monitored in a mobile ISP, including the number and amount of prepaid data transfer fees, number and cost of calls, the volume of data traffic, number of SMS, and more.
R. Wu and E. Keogh, “Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress,”IEEE Transactions on Knowledge and Data Engineering, pp. 1–1, 2021
The dataset is available in .csv format. In each file, the first column corresponds to the timestamps, and the rest represents each of the univariate series within the TELCO multivariate series. The temporal granularity is set at five minutes per sample.
The files are separated into training (January - March), validation (April), test (May - July).