Genomic Data; Synthetic Data; Phenotype; Sub-populations; SNPs
![](https://ieee-dataport.org/sites/default/files/styles/3x2/public/tags/images/dna-3598439_1920.jpg?itok=dq6kcJl6)
SynGen6 is a synthetic genomic dataset that encompasses six distinct populations. We utilized Principal Component Analysis (PCA) and ϵ-local differential privacy (LDP) to generate synthetic samples. We then simulated phenotype vectors associated with significant SNPs, mirroring real-world gene-disease associations. We also generated synthetic SNPs to watermark the dataset enabling verification of outsourced computations. Lastly, synthetic relatives were created to support research on kinship inference and family-based genomic analyses.
- Categories: