Skip to main content

Datasets

Standard Dataset

Transformer Electrocardiogram Biometrics Dataset

Citation Author(s):
Kai Jye Chee (School of Electrical and Electronic Engineering, USM Engineering Campus, Universiti Sains Malaysia, Nibong Tebal 14300, Malaysia)
Dzati Athiar Ramli (School of Electrical and Electronic Engineering, USM Engineering Campus, Universiti Sains Malaysia, Nibong Tebal 14300, Malaysia)
Submitted by:
Kai Jye Chee
Last updated:
DOI:
10.21227/syhd-3948
Data Format:
Research Article Link:
Links:
No Ratings Yet

Abstract

Many of the publicly available electrocardiogram (ECG) databases either have a low number of people in the database, each with longer recordings, or have more people, each with shorter recordings. As a result, attempting to split a single database into training, testing, and, optionally, validation datasets is challenging. Some models seem to do well with larger training sets, but that leaves only a small set of data for testing. Moreover, if the ECG is segmented by heartbeat, the data are further limited by the number of heartbeats in the recording. Combining multiple databases to increase the dataset is difficult because it needs to reconcile the differences across databases, potentially having to deal with different measuring devices, measuring conditions, sampling rate, type of noise, etc. A dataset generation procedure using blind segmentation as a data augmentation technique is used to generate huge amount of training and validation dataset. This procedure is not limited by the number of heartbeats in the ECG recording. Multiple ECG databases are combined to increase the total number of subjects and to provide more ECG variations. A total of 10 databases were used to generate the training and validation datasets. The huge amount of data with wide variations trained a generalized model.

Instructions:

.tfrecord files have "training" or "validation" prefixed filenames. each example is an dict with key: "label", "d0", "d1". "label" contains the position of the identity where the query is matched. "d0" contains the query ECG segment. "d1" contains the classification scope ECG segments. 

Funding Agency
Ministry of Higher Education Malaysia
Grant Number
FRGS/1/2020/ICT03/USM/02/1