Variant Analysis of Human Genome Sequences for COVID-19 Research
This data resource is an outcome of the NSF RAPID project titled "Democratizing Genome Sequence Analysis for COVID-19 Using CloudLab" awarded to University of Missouri-Columbia.
The resource contains the output of variant analysis (along with CADD scores) on human genome sequences obtained from the COVID-19 Data Portal. The variants include single nucleotide polymorphisms (SNPs) and short insert and deletes (indels).
We will be releasing the variant analysis output of human genome sequences periodically. Also, more sequences are being made available on the COVID-19 Data Portal. Please visit this page regularly for updates.
If you have comments or questions, please post them in the comments section below.
This work is supported by the National Science Foundation under Grant No. 2034247.
1. Download a .zip file.
2. Unzip the file and extract it into a folder.
3. There will be two folders, namely, VCF and CADD_Scores. These folders contain the compressed .vcf and .tsv files. The .vcf files are filtered VCF files produced by the GATK best practice workflow for RNA-seq data. The reference genome hg19 was used. There is also a .xlsx file containing the run accession IDs (e.g., SRR12095153) and URLs (e.g., https://www.ebi.ac.uk/ena/browser/view/SRR12095153) from where the paired end sequences were downloaded. Complete description of the sequences can be found via these URLs.
4. Check for new .zip files.
- Variant_Analysis_Output_Feb-28_2021_hg19.zip (41.69 MB)
- Variant_Analysis_Output_Mar-3_2021_hg19.zip (51.94 MB)
- Variant_Analysis_Output_Mar-8_2021_hg19.zip (5.64 MB)
- Variant_Analysis_Output_Mar-14_2021_hg19.zip (83.06 MB)
- Variant_Analysis_Output_Mar-30_2021_hg19.zip (50.11 MB)
- Variant_Analysis_Output_Mar-22_2021_part1_hg19.zip (109.78 MB)
- Variant_Analysis_Output_Mar-22_2021_part2_hg19.zip (77.83 MB)