CRAWDAD ucsd/cse

Citation Author(s):
Yu-Chung
Cheng
Submitted by:
CRAWDAD Team
Last updated:
Tue, 09/30/2008 - 08:00
DOI:
10.15783/C7S015
License:
127 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

Dataset of comprehensive traces of wireless activity in the UCSD Computer Science building.

To characterize the sources of delay in 802.11 production network, we collected comprehensive traces of wireless activity in the UCSD Computer Science building.

last modified : 2008-09-30

release date : 2008-08-25

date/time of measurement start : 2007-01-11

date/time of measurement end : 2007-01-11

collection environment : To characterize the sources of delay in 802.11 production network, we collected comprehensive traces of wireless activity in the UCSD Computer Science building. The traces was collected on Thursday, January 11, 2007.

network configuration : The production 802.11 network consists of 40 Avaya AP-8 802.11 b/g access points covering four floors and the basement. The APs are identically confiured (except for their channel assignment) and support both 802.11b and 802.11g without encryption. Our CSE wireless network has 40 APs. Their locations are in [labels.txt] and five .png files - [1st floor], [2nd floor], [3rd floor], [4th floor], and [basement].

data collection methodology : We use the Jigsaw system described in [cheng-jigsaw] to collect the traces. Jigsaw is a distributed wireless monitoring platform that we have deployed in our department building to monitor the production 802.11 network. The hardware monitors consist of 192 radios interspersed between the infrastructure APs. The radios passively monitor the wireless network and report all wireless events across location, channel, and time via a private wired network to a back-end storage server. Jigsaw merges and time synchronizes these separate radio traces into a single, global uni ed trace. Moreover, Jigsaw performs this operation in real time; a single 2.2Ghz AMD Opteron server can synchronize one minute of raw trace data in under 15 seconds. We configure Jigsaw to capture the first 120 bytes of each wireless frame. As a result, the aggregate monitor traffic from all radios ranges from 2-10Mbps and is roughly five times the amount of production wireless traffic.

sanitization : The last 3 octest of MAC addresses except 0:0:0 are anonymized but the OUIs are preserved. All IP addresses in IP header or payload are anonymized as well except the UCSD wireless subnet prefix (128.54.42/16, 42.0.0.0/16) and private addresses. We do not preserv any other IP prefixes. Everything beyond TCP, UDP, and DHCP header is removed. Also we do not recompute the IP/TCP checksums.

limitation : Please be careful that the wired packet and the wireles packets are not 1-to-1 match: * Every wired packet may have many multiple 802.11 retransmissions. * APs only forward 802.11 data frames. Management, control, NULL frames only exist in 802.11 network. * The sniffers may pick up non-CSE AP signals. Similarly, the sniffers may miss CSE packets. * The wired gateway forwards broadcast traffic among other two nearby bulidings wireless VLAN.

Traceset

ucsd/cse/jigsaw

Jigsaw traces of wireless activity in the UCSD Computer Science building.

  • files: jcap_hdr.h, wireless
  • description: We used Jigsaw - a tool for analyzing wireless traffic - to collect comprehensive traces of wireless activity in the UCSD Computer Science building.
  • measurement purpose: Network Diagnosis
  • methodology: 1. Software Jigsaw is a tool for analyzing wireless traffic across locations, channels, time, and protocol layers. It takes traces from multiple sniffers at distinct vantage points, identifies and synchronize the duplicate wireless frames in the traces, rebuild link layer and transport layer conversasions. This version also includes a madwifi driver patch that reduces the overhead of excessive logging of PHY and CRC error events. Jigsaw is available under GPL licence. 2. Hardware The guts of our wireless node/sensor is a Soekris net4801 or net4826 embedded computer, which has a 266 Mhz 586 class CPU (Geode) single chip processor. The 4801 board includes one Compact Flash slot, three 10/100 ethernet ports, 128 Megabytes of RAM, serial ports, MiniPCI/PCI slot. In addition, 4801 has IDE port and two USB 1.1 ports. 4826 can be powered over Ethernet. Most of our nodes are 4826 boxes. The Compact Flash slot is loaded with a Compact Flash card (4801 has 256M, 4826 has 64M), used to store the moderately patched Pebble Linux image and related files. In normal operation the card is mounted in read-only mode to reduce wear and help ensure filesystem consistency in the face of power outages. A small portion of the memory is mounted for RW file system access. Each node is equipped with two Atheros-based 802.11 a/b/g wireless cards. Two NICs enable a broader range of experiments. The radio is attached to a 5dBi omni-directional attenna. We use heavily patched versions of the Atheros MadWiFi driver for these radios. Originally, the 4801 has a 20 Gigabyte (minimum) IDE hard disk. But we found hard disk failure is the major cause for crashes, so we removed them from 4801 boxes. Otherwise, the boxes are pretty stable and seldom crashes beside our own Kernel/drivers bugs. For our traffic monitoring project, all traces are directly dumped over NFS to one RAID 0 2 TB storage server. We have done several things to help us test new software and run experiments more conviently. First we install/re-install the kernels and other software through a master controller to keep all software synchronized and up-to-date automatically. It usually takes 1-2 minutes to re-install everything for all boxes. Since the kernel logs are gone after reboot because they are stored in memory file systems, we have all kernel logs remotely logged into our master server. This helps us to perform post-crash analysis or makes system management easier in general. In cases when the kernel hangs/panics or for some reason we can not login to perform a manual reboot, we can remotely reboot these boxes (and instruct them to boot to a stable kernel) in a minute. In addition, we use Geode CPU watch dog functions to make the boxes reboot themselves after certain timeout. Thus we minimize manual intervention for software update, experiements, and debugging.

ucsd/cse/jigsaw Traces

  • wireless: Jigsaw traces of wireless activity in the UCSD Computer Science building.
    • configuration: the (merged) jigsaw traces collected using 192 sniffers in UCSD CSE building.
    • format:

      The file is a series of jcap_hdr ([jcap_hdr.h]) packets like the pcap_pkthdr packet format.

      We created our own header simply to save spaces.

ucsd/cse/tcpdump

Tcpdump traces of wireless activity in the UCSD Computer Science building.

  • files: wired
  • description: We collected tcpdump traces of wireless activity in the UCSD Computer Science building.
  • measurement purpose: Network Diagnosis
  • methodology: The tcpdump trace was collected at the gateway router that interfaces the campus giga-ether network and the CSE wireless VLAN.

ucsd/cse/tcpdump Traces

  • wired: Tcpdump traces of wireless activity in the UCSD Computer Science building.
    • configuration: the tcpdump trace at the gateway router that interfaces the campus giga-ether network and the CSE wireless VLAN.
    • format:

      The format is gzipped tcpdump pcap.

Instructions: 

The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort. 

About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing. 

CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022. 

Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques. 

Please acknowledge the source of the Data in any publications or presentations reporting use of this Data. 

Citation:

Yu-Chung Cheng, ucsd/cse, https://doi.org/10.15783/C7S015 , Date: 20080825

Dataset Files