- Citation Author(s):
- Submitted by:
- Fengrui Liu
- Last updated:
- Mon, 12/26/2022 - 10:50
For Internet-based service companies, anomaly detection on data streams is critical in troubleshooting, seeking to maintain service quality and reliability. Most of known detection methods have an underlying assumption that the data are always continuous. In practical applications, however, we learn that many real-world data are sporadic. It incurs particular challenges for the task of anomaly detection, for which the common preprocessing of downsampling on sporadic data can omit potential anomalies and delay alarms.
In this paper, we propose an adaptive anomaly detection method on sporadic data streams named AD2S. It consists of two modules: a monitor module to continuously and adaptively determine the measure windows for observations, and a detection module that utilizes an isolation partition strategy to estimate the anomaly degree of each incoming observation.
Our analysis demonstrates that the proposed method has constant amortized time and space complexity.
Based on experimental results on both synthetic and public real-world datasets, our method outperforms other state-of-the-art methods in anomaly detection on sporadic data streams, and the code is open-sourced.
Official repo for "AD2S: Adaptive Anomaly Detection on Sporadic Data Streams"
## Installing dependencies
To install the defined dependencies for this project, just run the install command
# Install the poetry package manager
curl -sSL https://install.python-poetry.org | python3 -
# And then install the dependencies
## Generate the synthetic dataset
Before you generate the synthetic dataset, you need to config the `root_path` of this project in the `utils/syn_config.yaml` file. You can also use the data directly from the `data` folder.
To generate the synthetic dataset, run the following command
python utils/synthetic.py --multirun data.synthetic_ds=1,2,3,4
This will generate the 4 different synthetic datasets defined in the paper. The generated datasets will be stored in the `data` folder.
Or, you can generate a single dataset by running the following command after setting the `data.synthetic_ds` flag to the desired dataset number.
We provide a quick tutorial on how to use the code in this repo. The tutorial is available in the `tutorial.ipynb`. You can follow the anomaly scores and parameters that you're interested in.
## Case Study
Case study can be found in `experiments/case_study.ipynb`
All the experiments can be found in the `experiments` folder, including the ablation study, comparison with other methods, and the effect of the parameters. All the source code and results are listed in the corresponding folders.
- AD2S_dataset.zip (22.80 kB)
- AD2S_code.zip (7.17 MB)