Variant Analysis of Human Genome Sequences for COVID-19 Research

Name: Variant Analysis of Human Genome Sequences for COVID-19 Research
Creator: Praveen Rao
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Biomedical and Health Sciences, COVID-19

Citation Author(s):: Praveen
Rao
University of Missouri-Columbia

Arun
Zachariah
University of Missouri-Columbia

Deepthi
Rao
University of Missouri-Columbia

Peter
Tonellato
University of Missouri-Columbia

Wesley
Warren
University of Missouri-Columbia

Eduardo
Simoes
University of Missouri-Columbia
Submitted by:: Praveen Rao
Last updated:: Sat, 12/04/2021 - 12:39
DOI:: 10.21227/b0ph-s175
Data Format:: VCF
TSV
GZIP
ZIP
Links:: Project GitHub Site
License:: Creative Commons Attribution

1402 Views

Categories:: Biomedical and Health Sciences
COVID-19
Keywords:: RNA-seq, paired end sequencing, covid-19, human genomes, variant analysis

1 rating - Please login to submit your rating.

CITE

Abstract

This data resource is an outcome of the NSF RAPID project titled "Democratizing Genome Sequence Analysis for COVID-19 Using CloudLab" awarded to University of Missouri-Columbia.

The resource contains the output of variant analysis (along with CADD scores) on human genome sequences obtained from the COVID-19 Data Portal. The variants include single nucleotide polymorphisms (SNPs) and short insert and deletes (indels).

For variant analysis, we used the GATK Best Practices workflow for RNA-seq data published by the Broad Institute. This workflow was executed on CloudLab, an NSF-funded experimental testbed.

We will be releasing the variant analysis output of human genome sequences periodically. Also, more sequences are being made available on the COVID-19 Data Portal. Please visit this page regularly for updates.

If you have comments or questions, please post them in the comments section below.

Acknowledgments

This work is supported by the National Science Foundation under Grant No. 2034247.

Instructions:

1. Download a .zip file.

2. Unzip the file and extract it into a folder.

3. There will be two folders, namely, VCF and CADD_Scores. These folders contain the compressed .vcf and .tsv files. The .vcf files are filtered VCF files produced by the GATK best practice workflow for RNA-seq data. The reference genome hg19 was used. There is also a .xlsx file containing the run accession IDs (e.g., SRR12095153) and URLs (e.g., https://www.ebi.ac.uk/ena/browser/view/SRR12095153) from where the paired end sequences were downloaded. Complete description of the sequences can be found via these URLs.

4. Check for new .zip files.

Comments

CADD score files have been updated.

Submitted by Praveen Rao on Mon, 04/12/2021 - 21:49

If you would like to perform variant annotation on the VCF files, please use SnpEFF (http://pcingola.github.io/SnpEff/). See https://pcingola.github.io/SnpEff/se_running/ for instructions to download and execute.

Here is an example on how to annotate variants using snpEff.jar:

$ java -Xmx8g -jar snpEff.jar hg19 SRR13113910.unmapped.variant_filtered.vcf > SRR13113910.unmapped.variant_filtered_ann_hg19.vcf

Submitted by Praveen Rao on Mon, 11/01/2021 - 14:56

Dataset Files

Variant_Analysis_Output_Feb-28_2021_hg19.zip (41.69 MB)
Variant_Analysis_Output_Mar-3_2021_hg19.zip (51.94 MB)
Variant_Analysis_Output_Mar-8_2021_hg19.zip (5.64 MB)
Variant_Analysis_Output_Mar-14_2021_hg19.zip (83.06 MB)
Variant_Analysis_Output_Mar-30_2021_hg19.zip (50.11 MB)
Variant_Analysis_Output_Mar-22_2021_part1_hg19.zip (109.78 MB)
Variant_Analysis_Output_Mar-22_2021_part2_hg19.zip (77.83 MB)
Variant_Analysis_Output_Apr-6_2021_hg19.zip (79.69 MB)
Variant_Analysis_Output_Apr-9_2021_hg19.zip (60.67 MB)
Variant_Analysis_Output_Apr-12_2021_hg19.zip (92.30 MB)
Variant_Analysis_Output_Apr-16_2021_hg19.zip (115.71 MB)
Variant_Analysis_Output_Apr-18_2021_hg19.zip (137.28 MB)
Variant_Analysis_Output_Apr-25_2021_part1_hg19.zip (89.80 MB)
Variant_Analysis_Output_Apr-25_2021_part2_hg19.zip (78.37 MB)
Variant_Analysis_Output_Apr-30_2021_part1_hg19.zip (85.51 MB)
Variant_Analysis_Output_Apr-30_2021_part2_hg19.zip (81.83 MB)
Variant_Analysis_Output_May-7_2021_hg19.zip (142.57 MB)
Variant_Analysis_Output_May-11_2021_hg19.zip (100.30 MB)
Variant_Analysis_Output_May-15_2021_hg19.zip (42.94 MB)
Variant_Analysis_Output_May-23_2021_hg19.zip (122.30 MB)
Variant_Analysis_Output_May-29_2021_hg19.zip (147.62 MB)
Variant_Analysis_Output_June-6_2021_hg19.zip (45.07 MB)
Variant_Analysis_Output_June-13_2021_hg19.zip (156.18 MB)
Variant_Analysis_Output_June-20_2021_hg19.zip (40.23 MB)
Variant_Analysis_Output_June-27_2021_hg19.zip (43.71 MB)
Variant_Analysis_Output_July-3_2021_hg19.zip (42.15 MB)
Variant_Analysis_Output_July-10_2021_hg19.zip (38.96 MB)
Variant_Analysis_Output_July-17_2021_hg19.zip (37.38 MB)
Variant_Analysis_Output_July-24_2021_hg19.zip (39.08 MB)
Variant_Analysis_Output_Jul-31_2021_hg19.zip (31.88 MB)
Variant_Analysis_Output_Aug-7_2021_hg19.zip (38.83 MB)
Variant_Analysis_Output_Aug-14_2021_hg19.zip (81.66 MB)
Variant_Analysis_Output_Aug-21_2021_hg19.zip (60.63 MB)
Variant_Analysis_Output_Aug-28_2021_hg19.zip (65.59 MB)

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.

QUESTIONS?

Report a problem with this Dataset

Datasets

Open Access