Genomic Data; Synthetic Data; Phenotype; Sub-populations; SNPs
SynGen6 is a synthetic genomic dataset that encompasses six distinct populations. We utilized Principal Component Analysis (PCA) and ϵ-local differential privacy (LDP) to generate synthetic samples. We then simulated phenotype vectors associated with significant SNPs, mirroring real-world gene-disease associations. We also generated synthetic SNPs to watermark the dataset enabling verification of outsourced computations. Lastly, synthetic relatives were created to support research on kinship inference and family-based genomic analyses.
- Categories:
70 Views