human genomes
This dataset contains .pcap files collected during the execution of variant calling on large number of human genomes using a cluster. The GATK4 variant calling pipeline was executed using AVAH in two testbeds, CloudLab and FABRIC. A 16-node cluster was used on CloudLab, and an 8-node cluster was used on FABRIC. The files were collected by running tcpdump on the network interfaces of the nodes.
- Categories:
This data resource is an outcome of the NSF RAPID project titled "Democratizing Genome Sequence Analysis for COVID-19 Using CloudLab" awarded to University of Missouri-Columbia.
The resource contains the output of variant analysis (along with CADD scores) on human genome sequences obtained from the COVID-19 Data Portal. The variants include single nucleotide polymorphisms (SNPs) and short insert and deletes (indels).
- Categories: