AD2S experiments

Name: AD2S experiments
Creator: Fengrui Liu
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Other

Citation Author(s):: Fengrui Liu
Submitted by:: Fengrui Liu
Last updated:: Mon, 07/08/2024 - 19:59
DOI:: 10.21227/ye8y-er02

44 views

Categories:

Other

Keywords:

Anomaly Detection

online detection

sporadic data streams

ACCESS DATASET CITE

Abstract

For Internet-based service companies, anomaly detection on data streams is critical in troubleshooting, seeking to maintain service quality and reliability. Most of known detection methods have an underlying assumption that the data are always continuous. In practical applications, however, we learn that many real-world data are sporadic. It incurs particular challenges for the task of anomaly detection, for which the common preprocessing of downsampling on sporadic data can omit potential anomalies and delay alarms.

In this paper, we propose an adaptive anomaly detection method on sporadic data streams named AD2S. It consists of two modules: a monitor module to continuously and adaptively determine the measure windows for observations, and a detection module that utilizes an isolation partition strategy to estimate the anomaly degree of each incoming observation.

Our analysis demonstrates that the proposed method has constant amortized time and space complexity.

Based on experimental results on both synthetic and public real-world datasets, our method outperforms other state-of-the-art methods in anomaly detection on sporadic data streams, and the code is open-sourced.

Instructions:

# AD2S Official repo for "AD2S: Adaptive Anomaly Detection on Sporadic Data Streams" ## Installing dependencies To install the defined dependencies for this project, just run the install command ```bash # Install the poetry package manager curl -sSL https://install.python-poetry.org | python3 - # And then install the dependencies poetry install ``` ## Generate the synthetic dataset Before you generate the synthetic dataset, you need to config the `root_path` of this project in the `utils/syn_config.yaml` file. You can also use the data directly from the `data` folder. To generate the synthetic dataset, run the following command ```bash python utils/synthetic.py --multirun data.synthetic_ds=1,2,3,4 ``` This will generate the 4 different synthetic datasets defined in the paper. The generated datasets will be stored in the `data` folder. Or, you can generate a single dataset by running the following command after setting the `data.synthetic_ds` flag to the desired dataset number. ```bash python utils/synthetic.py ``` ## Tutorial We provide a quick tutorial on how to use the code in this repo. The tutorial is available in the `tutorial.ipynb`. You can follow the anomaly scores and parameters that you're interested in. ## Case Study Case study can be found in `experiments/case_study.ipynb` ![Case1](./experiments/case_study/case1.png) ![Case2](./experiments/case_study/case2.png) ## Experiments ``` experiments ├── ablation ├── case_study ├── comparison ├── concept_drift ├── parameters_init_p └── parameters_n_chains ``` All the experiments can be found in the `experiments` folder, including the ablation study, comparison with other methods, and the effect of the parameters. All the source code and results are listed in the corresponding folders.

Datasets

Standard Dataset

AD2S experiments

Abstract

Instructions:

Dataset Files

DOCUMENTATION

DATASET SCRIPTS

QUESTIONS?

More like this Dataset

List of Indexed Journal: Web of Science, Scopus, and DOAJ

Dataset for classification of handwritten and printed text in a Doctor's prescription

Stock Market Tweets Data

Hotel Reviews from around the world with Sentiment Values and Review Ratings in different Categories for Natural Language Processing

SU-AIS BB-MAS (Syracuse University and Assured Information Security - Behavioral Biometrics Multi-device and multi-Activity data from Same users) Dataset

A Dataset on Online Learning-based Web Behavior from Different Countries Before and After COVID-19