Dataset For PerturbVFL

- Citation Author(s):
-
Haoran Cheng
- Submitted by:
- Cheng Haoran
- Last updated:
- DOI:
- 10.21227/kfs3-tn90
- Data Format:
- Categories:
- Keywords:
Abstract
Vertical Federated Learning (VFL) enables multiple organizations to collaboratively train machine learning models without sharing raw data, particularly suited for tabular datasets with aligned sample IDs but disjoint feature spaces. Despite its growing relevance in privacy-sensitive sectors such as finance and healthcare, publicly available benchmarks for VFL on tabular data remain limited. This paper introduces and categorizes a collection of real-world tabular datasets tailored for VFL research, highlighting their feature distribution, domain applicability, and security relevance. We also discuss preprocessing protocols, partition strategies, and potential use cases, aiming to support standardized evaluation and foster reproducible research in VFL on structured data.
Instructions:
Vertical Federated Learning (VFL) enables multiple organizations to collaboratively train machine learning models without sharing raw data, particularly suited for tabular datasets with aligned sample IDs but disjoint feature spaces. Despite its growing relevance in privacy-sensitive sectors such as finance and healthcare, publicly available benchmarks for VFL on tabular data remain limited. This paper introduces and categorizes a collection of real-world tabular datasets tailored for VFL research, highlighting their feature distribution, domain applicability, and security relevance. We also discuss preprocessing protocols, partition strategies, and potential use cases, aiming to support standardized evaluation and foster reproducible research in VFL on structured data.