Datasets
Open Access
Variant Analysis of Human Genome Sequences for COVID-19 Research
- Citation Author(s):
- Submitted by:
- Praveen Rao
- Last updated:
- Sat, 12/04/2021 - 12:39
- DOI:
- 10.21227/b0ph-s175
- Data Format:
- Links:
- License:
- Categories:
- Keywords:
Abstract
This data resource is an outcome of the NSF RAPID project titled "Democratizing Genome Sequence Analysis for COVID-19 Using CloudLab" awarded to University of Missouri-Columbia.
The resource contains the output of variant analysis (along with CADD scores) on human genome sequences obtained from the COVID-19 Data Portal. The variants include single nucleotide polymorphisms (SNPs) and short insert and deletes (indels).
For variant analysis, we used the GATK Best Practices workflow for RNA-seq data published by the Broad Institute. This workflow was executed on CloudLab, an NSF-funded experimental testbed.
We will be releasing the variant analysis output of human genome sequences periodically. Also, more sequences are being made available on the COVID-19 Data Portal. Please visit this page regularly for updates.
If you have comments or questions, please post them in the comments section below.
Acknowledgments
This work is supported by the National Science Foundation under Grant No. 2034247.
1. Download a .zip file.
2. Unzip the file and extract it into a folder.
3. There will be two folders, namely, VCF and CADD_Scores. These folders contain the compressed .vcf and .tsv files. The .vcf files are filtered VCF files produced by the GATK best practice workflow for RNA-seq data. The reference genome hg19 was used. There is also a .xlsx file containing the run accession IDs (e.g., SRR12095153) and URLs (e.g., https://www.ebi.ac.uk/ena/browser/view/SRR12095153) from where the paired end sequences were downloaded. Complete description of the sequences can be found via these URLs.
4. Check for new .zip files.
Dataset Files
- Variant_Analysis_Output_Feb-28_2021_hg19.zip (41.69 MB)
- Variant_Analysis_Output_Mar-3_2021_hg19.zip (51.94 MB)
- Variant_Analysis_Output_Mar-8_2021_hg19.zip (5.64 MB)
- Variant_Analysis_Output_Mar-14_2021_hg19.zip (83.06 MB)
- Variant_Analysis_Output_Mar-30_2021_hg19.zip (50.11 MB)
- Variant_Analysis_Output_Mar-22_2021_part1_hg19.zip (109.78 MB)
- Variant_Analysis_Output_Mar-22_2021_part2_hg19.zip (77.83 MB)
- Variant_Analysis_Output_Apr-6_2021_hg19.zip (79.69 MB)
- Variant_Analysis_Output_Apr-9_2021_hg19.zip (60.67 MB)
- Variant_Analysis_Output_Apr-12_2021_hg19.zip (92.30 MB)
- Variant_Analysis_Output_Apr-16_2021_hg19.zip (115.71 MB)
- Variant_Analysis_Output_Apr-18_2021_hg19.zip (137.28 MB)
- Variant_Analysis_Output_Apr-25_2021_part1_hg19.zip (89.80 MB)
- Variant_Analysis_Output_Apr-25_2021_part2_hg19.zip (78.37 MB)
- Variant_Analysis_Output_Apr-30_2021_part1_hg19.zip (85.51 MB)
- Variant_Analysis_Output_Apr-30_2021_part2_hg19.zip (81.83 MB)
- Variant_Analysis_Output_May-7_2021_hg19.zip (142.57 MB)
- Variant_Analysis_Output_May-11_2021_hg19.zip (100.30 MB)
- Variant_Analysis_Output_May-15_2021_hg19.zip (42.94 MB)
- Variant_Analysis_Output_May-23_2021_hg19.zip (122.30 MB)
- Variant_Analysis_Output_May-29_2021_hg19.zip (147.62 MB)
- Variant_Analysis_Output_June-6_2021_hg19.zip (45.07 MB)
- Variant_Analysis_Output_June-13_2021_hg19.zip (156.18 MB)
- Variant_Analysis_Output_June-20_2021_hg19.zip (40.23 MB)
- Variant_Analysis_Output_June-27_2021_hg19.zip (43.71 MB)
- Variant_Analysis_Output_July-3_2021_hg19.zip (42.15 MB)
- Variant_Analysis_Output_July-10_2021_hg19.zip (38.96 MB)
- Variant_Analysis_Output_July-17_2021_hg19.zip (37.38 MB)
- Variant_Analysis_Output_July-24_2021_hg19.zip (39.08 MB)
- Variant_Analysis_Output_Jul-31_2021_hg19.zip (31.88 MB)
- Variant_Analysis_Output_Aug-7_2021_hg19.zip (38.83 MB)
- Variant_Analysis_Output_Aug-14_2021_hg19.zip (81.66 MB)
- Variant_Analysis_Output_Aug-21_2021_hg19.zip (60.63 MB)
- Variant_Analysis_Output_Aug-28_2021_hg19.zip (65.59 MB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Comments
CADD score files have been updated.
If you would like to perform variant annotation on the VCF files, please use SnpEFF (http://pcingola.github.io/SnpEff/). See https://pcingola.github.io/SnpEff/se_running/ for instructions to download and execute.
Here is an example on how to annotate variants using snpEff.jar:
$ java -Xmx8g -jar snpEff.jar hg19 SRR13113910.unmapped.variant_filtered.vcf > SRR13113910.unmapped.variant_filtered_ann_hg19.vcf