CRAWDAD umich/virgil

Citation Author(s):
Anthony J.
Nicholson
Google
David
Wetherall
Yatin
Chawathe
Intel Research Seattle
Mike
Chen
Ludic Labs
Brian
Noble
University of Michigan, Ann Arbor
Submitted by:
CRAWDAD Team
Last updated:
Mon, 03/31/2008 - 08:00
DOI:
10.15783/C7M881
Data Format:
License:
68 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

War-walking data set collected in different cities in the United States for the field study and evaluation of an access point selection system.

We collected the data set through war walking, i.e., collecting Wi-Fi beacons by walking around the neighborhoods in different cities in the United States, for the field study and evaluation of Virgil, an access point selection system.

last modified : 2008-03-31

release date : 2008-03-28

date/time of measurement start : 2005-04-28

date/time of measurement end : 2005-09-16

collection environment : 802.11 wireless LAN access points (APs) are increasingly widespread in urban areas, with users commonly finding multiple APs on each scan. Therefore, access point selection - determining which AP will provide the best quality of service - is a critical problem. We conducted a small field study to determine the scope of the problem of existing access point discovery and selection system. Armed with the lessons from the field study, we designed a new AP selection system, which we named Virgil. We release two main tracesets: results from a field study (umich/virgil/field_study), and results from the evaluation of our prototype implementation of Virgil (umich/virgil/eval_data).

network configuration : For the field study, we walked a 1/2 square mile (1.3 square kilometer) grid of city streets with a PDA containing an 802.11 wireless card. The PDA ran Familiar Linux, a distribution targeted for handheld devices. We used a Compaq iPAQ handheld with an 802.11b wireless LAN card to collect data on the density and properties of different access points in an urban environment. For the evaluation study, we also used different hardware (a different iPAQ) for the evaluation runs than for the field study, due to equipment failure.

data collection methodology : For the field study portion of our traces, data was collected in three different neighborhoods of Chicago, Illinois. For the evaluation, data was collected in five different neighborhoods, of three different cities in the United States. Please see the tracesets /umich/virgil/field_study and /umich/virgil/eval_data for the methodology details.

sanitization : Only the SSIDs and MAC addresses have been altered. Each MAC address has been mapped to a string of the form mac: where uniqid is an increasing value (starting at 0) for each neighborhood, determined by the order of appearance in the trace of a given AP. mind). We use the 32-bit MD5 hash of each SSID string.

Traceset

umich/virgil/field_study

War-walking traceset collected in different neighborhoods of Chicago, Illinois for the field study of an access point selection system.

  • description: We collected the traceset through war walking, i.e., collecting Wi-Fi beacons by walking around the neighborhoods of Chicago, Illinois for the field study of Virgil, an access point selection system.
  • measurement purpose: Opportunistic Connectivity
  • methodology: We briefly summarize our methodology here---for full details, please refer to our paper: A.J. Nicholson et al., "Improved Access Point Selection", in proceedings of MobiSys 2006. We walked each neighborhood with a Compaq iPAQ that contained an 802.11 wireless card. Our field study script repeatedly performed the following process. First, it scanned for AP beacons, then processed each detected AP in turn. For all APs, parameters such as AP MAC address, channel, encryption status, signal strength, et cetera were logged. Next, for AP not using encryption, the script attempted to receive an IP address from the AP via DHCP. The success of this operation was also logged. Finally, for APs that granted a DHCP address, the script ran a series of tests designed to probe the application-visible quality of the Internet connection provided by the AP. The script connected to a reference server at the University of Michigan to estimate the bandwidth and latency to an arbitrary Internet server, and the status (open, closed, or redirected) of 37 common TCP ports.

umich/virgil/field_study Traces

  • warwalk: War-walking traces collected in Chicago, Illinois for the field study of an access point selection system.
    • configuration: For the field study portion of our traces, data was collected in three different neighborhoods of Chicago, Illinois: The Loop (loop): the central business district. Data was collected during the day on a busy workday (Wednesday 4 May 2005, from 10:32 am to 3:47 pm). Wicker Park (wkpk): a high-density residential neighborhood northwest of downtown. Due to inclement weather, data was collected in two different sessions: Thursday 28 April 2005, from 3:40 pm to 5:04 pm, and Monday 2 May 2005, from 10:32 am to 11:40 am. Different areas of the neighborhood were probed on the two collection days, but the user will notice a small number of duplicate APs. We have kept the two traces separate simply for the benefit of CRAWDAD users who may want to draw conclusions from the timestamps in the traces. Evanston (evanston): a suburb and college town, north of the city limits. Data was collected on Thursday 5 May 2005, from 11:02 am to 12:49 pm. For all three neighborhoods, we walked a roughly 1/2 square-mile (1.3 square-kilometer) area on the sidewalk (following the street grid pattern). For the case of Wicker Park, both days of mapping together cover a 1/2 square-mile area. This methodology was in no way intended to duplicate a realistic mobility pattern, but rather to simply "map-out" each neighborhood so one can draw conclusions in aggregate concerning the quality, availablity and deployment of wireless connectivity in each instance.
    • format:

      The field study data are in the field_study subdirectory. Inside each

      directory, the user will find a schema file, which describes in detail

      the format and proper interpretation of the data files in each dataset.

      In the field_study directory, there are two data files for each collection run:

      trace.: human-readable trace output of the field study script.

      Keywords "APSCAN begin" and "APSCAN end" delineate the

      results of a new scan for AP beacons. The test results

      of each AP are then presented in turn, separated by

      lines of "#####".

      ap_info.: comma-separated value file, one line per AP encountered

      in the data collection session. The only information

      not present in this file that is in the trace.* file

      are timestamps for all operations, and the information

      on what groups of AP were discovered at the same time

      (in the same beacon scan set).

      Each line of data in the file maps to these values:

       

      struct ap_db_entry {

      ssid, AP SSID

      mac_addr, AP MAC address

      encryption, is WEP enabled? {ON,OFF}

      bitrate, Bitrate, in Mb/s, from iwconfig

      linkquality, link quality, from iwconfig

      signallevel, signal level, from iwconfig

      noiselevel, noise level, from iwconfig

      channel, frequency (GHz) of the AP

      dhcpsuccess, did AP grant DHCP address? (yes=1, no=0)

       

      /* Note: everything past this point is always 0 if

      * dhcpsuccess is 0 */

       

      rtt, round-trip-time to reference server (ms)

      /* port tests: 0=closed, 1=open, 2=redirected */

      port_21, ftp

      port_22, ssh

      port_23, telnet

      port_25, smtp

      port_79, finger

      port_80, http

      port_88, kerberos

      port_115, sftp

      port_119, nntp

      port_123, ntp

      port_135, rpc

      port_109, pop-2

      port_110, pop-3

      port_143, imap2

      port_194, irc

      port_201, appletalk

      port_369, coda

      port_443, https

      port_445, samba

      port_389, ldap

      port_636, secure ldap

      port_750, kerberos

      port_993, secure imap

      port_994, secure irc

      port_995, secure pop3

      port_1080, socks proxy

      port_1214, kazaa

      port_1434, ms sql server

      port_2049, nfs

      port_2430, venus (Coda)

      port_3306, mysql

      port_5010, yahoo messenger

      port_5190, AOL instant messenger

      port_5680, canna

      port_5800, vnc

      port_6346, gnutella

      port_7000, afs

      /* finally, the bandwidth value */

      bw, bandwidth to reference server (bytes/s)

      };

      The trace.* file can be considered the "primary source". We provide the ap_info.* files as a convenience to the user, so that each user of the data need not write the same script to parse out the most useful information.

umich/virgil/eval_data

War-walking traceset collected in different cities in the United States for the evaluation of an access point selection system.

  • description: We collected the trace set through war walking, i.e., collecting Wi-Fi beacons by walking around the neighborhoods in different cities in the United States, for the evaluation of Virgil, an access point selection system.
  • measurement purpose: Opportunistic Connectivity
  • methodology: We briefly summarize our methodology here---for full details, please refer to our paper: A.J. Nicholson et al., "Improved Access Point Selection", in proceedings of MobiSys 2006. We walked each neighborhood with a Compaq iPAQ that contained an 802.11 wireless card. Unlike for the field study data set, we did not just periodically scan for available APs and test their capabilities. The Virgil AP selection daemon periodically scanned and tested APs to locate a usable AP, but once one was found, it stuck with it until the iPAQ passed out of its radio range. As a result, users of this dataset will notice significant gaps in between scan sets. This is the time during which the device was associated with an access point. Note: due to a bug, all test results (AP frequency, signal strength, et cetera) for APs using WEP encryption were mistakenly set to 0 when Virgil wrote out the logs. This did not affect our results because none of the algorithms in the evaluation attempted to use these encrypted, inaccessible APs. We regret, however, that this data on the link-layer properties of these encrypted APs is unavailable to the user. We recommend the field study dataset for those who require such data. Also note that, unlike in the field study, the Virgil daemon caches test results for performance. Therefore, once a given AP is seen in a neighborhood trace, when it is subsequently detected the application-level tests are not re-run, but rather the cached test results written out to the log.

umich/virgil/eval_data Traces

  • warwalk: War-walking traceset collected in different cities in the United States for the evaluation of an access point selection system.
    • configuration: For our evaluation, data was collected in five different neighborhoods, of three different cities in the United States. All timestamps in the datafiles are UTC, so the local times must be calculated accordingly. Because daylight savings time was in effect, Ann Arbor was UTC-4, Chicago UTC-5, and Seattle UTC-7. Neighborhoods: Chicago Loop (loop): the central business district. Data was collected during the day on a busy workday (Tuesday, 19 July 2005, 3:30-4:35 pm local time). Chicago, Wicker Park (wkpk): a high-density residential neighborhood northwest of downtown. Data collected on Monday, 18 July 2005, 7:40-9:13 am local time. Chicago, Evanston (evanston): a suburb and college town, north of the city limits. Data collected on Monday, 18 July 2005, 11:44 am to 3:20 pm. Downtown Seattle (seattle): the central business district. Data was collected on Wednesday, 20 July 2005, 7:18pm until 12:03am on July 21st (five hours later). Ann Arbor, Michigan: the downtown area. Friday, 16 September 2005, 9:41-10:44 am. For all three neighborhoods, we walked a roughly 1/2 square-mile (1.3 square-kilometer) area on the sidewalk (following the street grid pattern).
    • format:

      The evaluation data in the eval_data directory. Inside each directory,

      the user will find a schema file, which describes in detail the format

      and proper interpretation of the data files in each dataset.

      In the eval_data directory, for each of the five neighborhoods, we provide a

      scansets. file. This file consists of a series of scan

      sets. A scan set is defined as the test results for a given set of

      APs, whose AP beacons the Virgil daemon detected when searching for a

      new AP at a given physical spot.

      The first line of each scan set is of the form:

      SCAN_SET 3 |2005-07-21_02:19:16.808727

      where the "3" denotes this is the third scan set in the neighborhood's

      trace, and the remainder of the line is the time instant (in UTC) at

      which the scan occured.

      The remainder of each scan set consists of a series of lines, where

      each line corresponds to an AP in the scan set. Each line is a series

      of comma-separated values comprising the test result for the AP in

      question:

      struct ap_db_entry {

      ssid, AP SSID

      mac_addr, AP MAC address

      encryption, is WEP enabled? {ON,OFF}

      linkquality, link quality, x/92, from iwconfig

      signallevel, signal level, -x dBm, from iwconfig

      noiselevel, noise level, -x dBm, from iwconfig

      channel, frequency (GHz) of the AP

      dhcpsuccess, did AP grant DHCP address? (yes=1, no=0)

      test_results optional test results (described below)

      };

      If the AP did not grant a DHCP address (dhcpsuccess==0), then the line

      terminates with the dhcpsuccess parameter. Otherwise, the next item is

      the round-trip-time (RTT) estimate in ms, then the bandwidth estimate

      in bytes/sec. Finally, there is a sequence of tuples (port,status),

      where port is a TCP port number, and status is one of {CLOSED=1,

      OPEN=2, REDIRECTED=3}. Note that these constants are different than

      those defined in the field study dataset.

Instructions: 

The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort. 

About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing. 

CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022. 

Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques. 

Please acknowledge the source of the Data in any publications or presentations reporting use of this Data. 

Citation:

Anthony J. Nicholson, David Wetherall, Yatin Chawathe, Mike Chen, Brian Noble, umich/virgil, https://doi.org/10.15783/C7M881 , Date: 20080328

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.

Documentation

AttachmentSize
File umich-virgil-readme.txt1.62 KB

These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.

Questions about CRAWDAD? See our CRAWDAD FAQ. Interested in submitting your dataset to the CRAWDAD collection? Get started, by submitting an Open Access Dataset.