The advent of the Industrial Internet of Things (IIoT) has led to the availability of huge amounts of data, that can be used to train advanced Machine Learning algorithms to perform tasks such as Anomaly Detection, Fault Classification and Predictive Maintenance. Most of them are already capable of logging warnings and alarms occurring during operation. Turning this data, which is easy to collect, into meaningful information about the health state of machinery can have a disruptive impact on the improvement of efficiency and up-time. The provided dataset consists of a sequence of alarms logged by packaging equipment in an industrial environment. The collection includes data logged by 20 machines, deployed in different plants around the world, from 2019-02-21 to 2020-06-17. There are 154 distinct alarm codes, whose distribution is highly unbalanced.
In this dataset, we provide both raw and processed data. As for raw data, raw/alarms.csv is a comma-separated file with a row for each logged alarm. Each row provides the alarm code, the timestamp of occurrence, and the identifier of the piece of equipment generating the alarm. From this file, it is possible to generate data for tasks such as those described in the abstract. For the sake of completeness, we also provide the Python code to process data and generate input and output sequences that can be used to address the task of predicting which alarms will occur in a future time window, given the sequence of all alarms occurred in a previous time window (processed/all_alarms.pickle, processed/all_alarms.json, and processed/all_alarms.npz). The Python module to process raw data into input/output sequences is dataset.py. In particular, function create_dataset allows creating sequences already split in train/test and stored in a pickle file. It is also possible to use create_dataset_json and create_dataset_npz to obtain different output formats for the processed dataset. The ready-to-use datasets provided in the zipped folder were created by considering an input of 1720 minutes and an output window of 480 minutes. More information can be found in the attached readme.md file.