Skip to main content

Datasets

Standard Dataset

Dataset For PerturbVFL

Citation Author(s):
Haoran Cheng
Submitted by:
Cheng Haoran
Last updated:
DOI:
10.21227/kfs3-tn90
Data Format:
10 views
Categories:
Keywords:
No Ratings Yet

Abstract

Vertical Federated Learning (VFL) enables multiple organizations to collaboratively train machine learning models without sharing raw data, particularly suited for tabular datasets with aligned sample IDs but disjoint feature spaces. Despite its growing relevance in privacy-sensitive sectors such as finance and healthcare, publicly available benchmarks for VFL on tabular data remain limited. This paper introduces and categorizes a collection of real-world tabular datasets tailored for VFL research, highlighting their feature distribution, domain applicability, and security relevance. We also discuss preprocessing protocols, partition strategies, and potential use cases, aiming to support standardized evaluation and foster reproducible research in VFL on structured data.

Instructions:

Vertical Federated Learning (VFL) enables multiple organizations to collaboratively train machine learning models without sharing raw data, particularly suited for tabular datasets with aligned sample IDs but disjoint feature spaces. Despite its growing relevance in privacy-sensitive sectors such as finance and healthcare, publicly available benchmarks for VFL on tabular data remain limited. This paper introduces and categorizes a collection of real-world tabular datasets tailored for VFL research, highlighting their feature distribution, domain applicability, and security relevance. We also discuss preprocessing protocols, partition strategies, and potential use cases, aiming to support standardized evaluation and foster reproducible research in VFL on structured data.