human genomes

This dataset contains .pcap files collected during the execution of variant calling on large number of human genomes using a cluster. The GATK4 variant calling pipeline was executed using AVAH  in two testbeds, CloudLab and FABRIC. A 16-node cluster was used on CloudLab, and an 8-node cluster was used on FABRIC. The files were collected by running tcpdump on the network interfaces of the nodes.

Categories:
871 Views

This data resource is an outcome of the NSF RAPID project titled "Democratizing Genome Sequence Analysis for COVID-19 Using CloudLab" awarded to University of Missouri-Columbia.

The resource contains the output of variant analysis (along with CADD scores) on human genome sequences obtained from the COVID-19 Data Portal. The variants include single nucleotide polymorphisms (SNPs) and short insert and deletes (indels).

Categories:
1376 Views