Genomic Data; Synthetic Data; Phenotype; Sub-populations; SNPs

SynGen6 is a synthetic genomic dataset that encompasses six distinct populations.  We utilized Principal Component Analysis (PCA) and ϵ-local differential privacy (LDP) to generate synthetic samples. We then simulated phenotype vectors associated with significant SNPs, mirroring real-world gene-disease associations. We also generated synthetic SNPs to watermark the dataset enabling verification of outsourced computations. Lastly, synthetic relatives were created to support research on kinship inference and family-based genomic analyses.

Categories:
25 Views