Transformer Electrocardiogram Biometrics Dataset
Many of the publicly available electrocardiogram (ECG) databases either have a low number of people in the database, each with longer recordings, or have more people, each with shorter recordings. As a result, attempting to split a single database into training, testing, and, optionally, validation datasets is challenging. Some models seem to do well with larger training sets, but that leaves only a small set of data for testing. Moreover, if the ECG is segmented by heartbeat, the data are further limited by the number of heartbeats in the recording. Combining multiple databases to increase the dataset is difficult because it needs to reconcile the differences across databases, potentially having to deal with different measuring devices, measuring conditions, sampling rate, type of noise, etc. A dataset generation procedure using blind segmentation as a data augmentation technique is used to generate huge amount of training and validation dataset. This procedure is not limited by the number of heartbeats in the ECG recording. Multiple ECG databases are combined to increase the total number of subjects and to provide more ECG variations. A total of 10 databases were used to generate the training and validation datasets. The huge amount of data with wide variations trained a generalized model.
.tfrecord files have "training" or "validation" prefixed filenames. each example is an dict with key: "label", "d0", "d1". "label" contains the position of the identity where the query is matched. "d0" contains the query ECG segment. "d1" contains the classification scope ECG segments.