Datasets
Standard Dataset
Biometric Datasets for Federated Learning with Privacy and Integrity Constraints (SigD, BIDMC, TBME)
- Citation Author(s):
- Submitted by:
- Hao Zhou
- Last updated:
- Fri, 04/25/2025 - 04:46
- DOI:
- 10.21227/bj9y-7s54
- License:
- Categories:
- Keywords:
Abstract
This dataset collection supports the research presented in the manuscript titled “Privacy-preserving and Verifiable Federated Learning for Biometric Data in Edge Computing” (submitted to IEEE Transactions on Knowledge and Data Engineering). It includes three curated biometric datasets—SigD, BIDMC, and TBME—that are used to evaluate the BPVFL framework’s performance in privacy-preserving and verifiable federated learning scenarios.
SigD contains digital signature dynamics captured from stylus-based handwriting on mobile devices. BIDMC provides photoplethysmography (PPG) recordings from intensive care unit patients, widely used in biomedical signal processing research. TBME comprises multi-session PPG signals collected in controlled environments for biometric verification studies.
These datasets are used to simulate federated learning environments with realistic edge node distributions, emphasizing non-IID data, high-dimensional feature processing, and multi-class identity classification. Each dataset is preprocessed for federated simulation and annotated with standard metadata.
This dataset package includes three biometric datasets—SigD, BIDMC, and TBME. Each dataset has been preprocessed and formatted to support federated learning experiments in biometric verification and privacy-preserving model training.
Each dataset is partitioned by user identity to simulate non-IID data distributions across clients in a federated learning setup. Feature files are provided in .pl format, with each row representing a biometric sample and the corresponding class label (user ID). To reproduce results from the paper, follow the preprocessing instructions and use the provided features as input to your federated training pipeline. Data should be loaded independently per client to ensure privacy-aware simulations.