Data Preprocessing
Anomaly detection plays a crucial role in various domains, including but not limited to cybersecurity, space science, finance, and healthcare. However, the lack of standardized benchmark datasets hinders the comparative evaluation of anomaly detection algorithms. In this work, we address this gap by presenting a curated collection of preprocessed datasets for spacecraft anomalies sourced from multiple sources. These datasets cover a diverse range of anomalies and real-world scenarios for the spacecrafts.
- Categories:
The data included here within is the associated model training results from the correlated paper "Distribution-Driven Augmentation of Real-World Datasets for Improved Cancer Diagnostics With Machine Learning". This paper focuses on using kernel density estimators to curate datasets by balancing classes and filling missing null values though synthetically generated data. Additionally, this manuscript proposes a technique for joining distinct datasets to train a model with necessary features from multiple different datasets as a type of transfer-learning.
- Categories:
Summary: The archive of DCLN project (https://sourceforge.net/projects/dcln/) is provided.
Code & Script: Written in C/C++, run Shell scripts on Linux system. Mature DCLNv2 package available for download. Check './dcln.sh' for usage info.
Document: Details of hyperparameters tuning, data preprocessing and code compiling are given.
Data: Four nonlinear simulation datasets are provided (Fig. 2 of the main paper). Each study has ~2000 training samples and ~2000 test samples.
---
- Categories: