Alarm Logs in Packaging Industry (ALPI)

Citation Author(s):
Galdi S.r.l.
Dalle Pezze
Università degli Studi di Padova
Statwolf Data Science S.r.l.
Gian Antonio
Università degli Studi di Padova
Università degli Studi di Padova
Submitted by:
Diego Tosato
Last updated:
Thu, 07/30/2020 - 10:09
Data Format:
1 rating - Please login to submit your rating.


The advent of the Industrial Internet of Things (IIoT) has led to the availability of huge amounts of data, that can be used to train advanced Machine Learning algorithms to perform tasks such as Anomaly Detection, Fault Classification and Predictive Maintenance. Even though not all pieces of equipment are equipped with sensors yet, usually most of them are already capable of logging warnings and alarms occurring during operation. Turning this data, which is easy to collect, into meaningful information about the health state of machinery can have a disruptive impact on the improvement of efficiency and up-time. The provided dataset consists of a sequence of alarms logged by packaging equipment in an industrial environment. The collection includes data logged by 20 machines, deployed in different plants around the world, from 2019-02-21 to 2020-06-17. There are 154 distinct alarm codes, whose distribution is highly unbalanced. This data can be used to address the following tasks: 

  1. Next alarm forecasting: this problem can be framed as a supervised multi-class classification task, or a binary classification task when a specific alarm code is considered.
  2. Predicting alarms occurring in a future time frame: here the goal is to forecast the occurrence of certain alarm types in a future time window. Since many alarms can occur, this is a supervised multi-label classification.
  3. Future alarm sequence prediction: here the goal is predicting an ordered sequence of future alarms, in a sequence-to-sequence forecasting scenario.
  4. Anomaly Detection: the task is to detect abnormal equipment conditions, based on the pattern of alarms sequence. This task can be either unsupervised, if only the input sequence is considered, or supervised if future alarms are taken into account to assess whether or not there is an anomaly.

 All of the above tasks can also be studied from a continual learning perspective. Indeed, information about the serial code of the specific piece of equipment can be used to train the model; however, a scalable model should also be easy to apply to new machines, without the need of a new training from scratch. The collection and release of this dataset has been supported by the Regione Veneto project PreMANI (MANIFATTURA PREDITTIVA: progettazione, sviluppo e implementazione di soluzioni di Digital Manufacturing per la previsione della Qualita e la Manutenzione Intelligente - PREDICTIVE MAINTENANCE: design, development and implementation of Digital Manufacturing solutions for the intelligent quality and maintenance systems).


In this dataset, we provide both raw and processed data. As for raw data, raw/alarms.csv is a comma-separated file with a row for each logged alarm. Each row provides the alarm code, the timestamp of occurrence, and the identifier of the piece of equipment generating the alarm. From this file, it is possible to generate data for tasks such as those described in the abstract. For the sake of completeness, we also provide the Python code to process data and generate input and output sequences that can be used to address the task of predicting which alarms will occur in a future time window, given the sequence of all alarms occurred in a previous time window (processed/all_alarms.pickleprocessed/all_alarms.json, and processed/all_alarms.npz). The Python module to process raw data into input/output sequences is In particular, function create_dataset allows creating sequences already split in train/test and stored in a pickle file. It is also possible to use create_dataset_json and create_dataset_npz to obtain different output formats for the processed dataset. The ready-to-use datasets provided in the zipped folder were created by considering an input of 1720 minutes and an output window of 480 minutes. More information can be found in the attached file.