Datasets
Standard Dataset
A Dataset of Network Traffic Collected During Large-Scale Human Genome Sequence Analysis
- Citation Author(s):
- Submitted by:
- Praveen Rao
- Last updated:
- Tue, 05/30/2023 - 17:04
- DOI:
- 10.21227/y0t5-1w13
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset contains .pcap files collected during the execution of variant calling on large number of human genomes using a cluster. The GATK4 variant calling pipeline was executed using AVAH in two testbeds, CloudLab and FABRIC. A 16-node cluster was used on CloudLab, and an 8-node cluster was used on FABRIC. The files were collected by running tcpdump on the network interfaces of the nodes. One file was produced every 30 mins; a snapshot length of 94 bytes was specified for tcpdump. On CloudLab, bare metal serves were used for the cluster. On FABRIC, virtual machines were used for the cluster. Each .pcap file is named with a worker/host name and start time when the network traffic was collected for that file.
1. Download the .pcap.tar.gz files. The name of the testbed is provided as a prefix for each tar ball. Each tar ball corresponds to traffic send/received by one worker node in the cluster.
2. Unzip/untar the files using tar. For example, use: tar xvfz <testbed>-vm1.tar.gz
3. Use Wireshark (https://www.wireshark.org/) or tshark (https://www.wireshark.org/docs/man-pages/tshark.html) to analyze the traffic data.
Dataset Files
- Worker 1/CloudLab testbed CloudLab-vm1.tar.gz (5.22 GB)
- Worker 2/CloudLab testbed CloudLab-vm2.tar.gz (6.02 GB)
- Worker 3/CloudLab testbed CloudLab-vm3.tar.gz (5.76 GB)
- Worker 4/CloudLab testbed CloudLab-vm4.tar.gz (5.74 GB)
- Worker 5/CloudLab testbed CloudLab-vm5.tar.gz (7.07 GB)
- Worker 6/CloudLab testbed CloudLab-vm6.tar.gz (5.34 GB)
- Worker 7/CloudLab testbed CloudLab-vm7.tar.gz (6.02 GB)
- Worker 8/Cloudlab testbed CloudLab-vm8.tar.gz (6.24 GB)
- Worker 9/CloudLab testbed CloudLab-vm9.tar.gz (6.17 GB)
- Worker 10/CloudLab testbed CloudLab-vm10.tar.gz (9.07 GB)
- Worker 11/CloudLab testbed CloudLab-vm11.tar.gz (5.81 GB)
- Worker 12/CloudLab testbed CloudLab-vm12.tar.gz (9.15 GB)
- Worker 13/CloudLab testbed CloudLab-vm13.tar.gz (5.27 GB)
- Worker 14/CloudLab testbed CloudLab-vm14.tar.gz (5.69 GB)
- Worker 15/CloudLab testbed CloudLab-vm15.tar.gz (6.79 GB)
- Worker 1/FABRIC testbed FABRIC-vm1.tar.gz (3.63 GB)
- Worker 2/FABRIC testbed FABRIC-vm2.tar.gz (3.13 GB)
- Worker 3/FABRIC testbed FABRIC-vm3.tar.gz (5.46 GB)
- Worker 4/FABRIC testbed FABRIC-vm4.tar.gz (3.28 GB)
- Worker 5/FABRIC testbed FABRIC-vm5.tar.gz (6.78 GB)
- Worker 6/FABRIC testbed FABRIC-vm6.tar.gz (3.36 GB)
- Worker 7/FABRIC testbed FABRIC-vm7.tar.gz (3.42 GB)