Datasets
Open Access
CRAWDAD stanford/gates
- Citation Author(s):
- Submitted by:
- CRAWDAD Team
- Last updated:
- Tue, 11/14/2006 - 08:00
- DOI:
- 10.15783/C7NC70
- Data Format:
- License:
- Collection:
- CRAWDAD
- Categories:
- Keywords:
Abstract
Traces of the Stanford CS department's wireless network.
This dataset contains traces of the Stanford CS department's wireless network.
date/time of measurement start: 1999-09-20
date/time of measurement end: 1999-12-12
collection environment: We collected a 12-week trace of a local-area wireless network installed throughout the Gates Computer Science Building of Stanford University. The building is L-shaped (the longer edge is called the a-wing, and the shorter the b-wing). It has four main floors with offices and labs, a basement with classrooms and labs, and a fifth floor with a lounge and a few offices. Each of the main floors has two access points, one for each wing. Additionally, the first floor has an access point for a large conference room; the library, which spans both the second and third floors, also has an access point. The basement has two access points, one near the classrooms and one for the Interactive Room, a special research project in the department. The smaller fifth floor only has one access point. The wireless user community consists of 74 users who can be roughly divided into four groups: - 35 first year PhD students, who were each given a laptop with a WaveLAN card upon arrival (which corresponds to the beginning of the trace). Their offices are primarily in the 2b wing. - 22 graphics students and staff, the majority of whom received laptops with WaveLAN cards a week into the tracing period. Their offices are primarily in the 3b wing. - Three robots, used by the robotics lab for research. The robots do not have to authenticate themselves to reach the outside network. While the robots are somewhat mobile, they stay in the 1a wing. Although these WaveLAN cards are intended to be used by the robots, students in the robotics lab also use the network cards for session connections and websurfing. - 14 other users (students, staff, and faculty) scattered throughout the building. In addition to these 74 users, there were also four users who authenticated themselves but only connected to wired ports on the public subnet rather than the wireless network. We do not consider these users in the rest of this analysis of the wireless network.
network configuration: In the Gates Computer Science Building at Stanford University, administrators have made a "public" subnet available for any user affiliated with the university. Users desiring network access via this subnet must authenticate themselves to use their dynamically assigned IP address to access the rest of the departmental and university networks and the Internet. This subnet is accessible both from a wireless network and from Ethernet ports in public places in the building, such as conference rooms, lounges, the library, and labs. The wireless network is a WaveLAN network with WavePoint II access points acting as bridges between the wireless and wired networks. The access points each have two slots for wireless network interfaces; both slots are filled, one with older 2 Mbps cards to support the few users who have not updated their hardware yet, and the other with WaveLAN IEEE802.11-compatible 10 Mbps cards. Because all of the wireless users are on a single subnet (which promotes roaming without the need for Mobile IP or other such support), we gathered traces on the router that connects the public subnet to the rest of the departmental wired network. The router is a 90 MHz Pentium running RedHat Linux with two 10 Mbps network interfaces. One interface connects to the public subnet, and the other connects to the departmental network.
data collection methodology: To gather all of the information we wanted, we collected three separate types of traces during a 12-week period encompassing the 1999 Fall quarter (from Monday, September 20 through Sunday, December 12). The first trace we gathered is a tcpdump trace of the link-level and network-level headers of all packets that went through the router. We use this information in conjunction with the other two traces. The second trace is an SNMP trace. Approximately every two minutes, the router queries, via Ethernet, all twelve access points for the MAC addresses of the hosts currently using that access point as a bridge to the wired network. Once we know which access point a MAC address uses for network access, we know the approximate location (floor and wing) of the device with that MAC address. We pair these MAC addresses with the link level addresses saved in the packet headers to determine the approximate locations of the hosts in the tcpdump trace. The overhead from the SNMP tracing is low: 530 packets or 50 KBytes is the average overhead from querying all twelve access points every two minutes. The overhead for querying an individual access point is 3.2 KBytes if no MAC addresses are using that access point; otherwise, the base overhead is 14.5 KBytes for one user at an access point, plus 1 KByte for every additional user. The last trace is the authentication log, which keeps track of which users request authentication to use the network. Each request has both the user's login name as well as the MAC address from which the user makes the request. We pair these MAC addresses with the link-level addresses saved in the tcpdump trace to determine which user sends out each packet.
sanitization: We obtained permission to collect these traces from the Department Chair and informed all network users that this tracing was taking place. We additionally informed users we would record packet header information only (not the contents) and that we would anonymize the data. Knowledge of the tracing may have perturbed user behavior, but we have no way of quantifying the effect.
stanford/gates Traceset
combined
This traceset contains traces of the Stanford CS department's wireless network.
- file: final.anon.tar.gz
- description: This traceset contains traces of the Stanford CS department's wireless network.
- measurement purpose: Usage Characterization, User Mobility Characterization
- methodology: We use the common timestamp and MAC address information to combine three traces (tcpdump, SNMP, and authentication logs) into a single trace. The original three traces are not publicly available.
- sanitization: We have anonymized the user and remote host names for privacy reasons.
stanford/gates/combined Traces
- anon: This trace contains traces of the Stanford CS department's wireless network.
- configuration: We use the common timestamp and MAC address information to combine these three traces (tcpdump, SNMP, and authentication logs) into a single trace with a total of 78,739,933 packets attributable to the 74 wireless users. An additional 37,893,656 packets are attributable to the SNMP queries and 1,551,167 packets are attributable to the four wired users. The number of packets attributable to the SNMP queries might seem high, but each access point is queried every two minutes even if no laptops are actively generating traffic.
- format: [time] [pkt size] [username] [access point loc] [app] [dir] [remote host]
dir is the direction -- incoming or outgoing or both
(i.e., internal, or neither i.e.,
dhcp hadn't really gotten its act together yet).
app will be a dotted port number (src/dst)
if it's not recognized.
time is at second granularity.
pkt size is in bytes.
- note: Note that because we do not record any signal strength information, and since our access points generally cover a whole wing of a floor, we cannot necessarily detect movement within a wing but only movement between access points.
The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort.
About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing.
CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022.
Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques.
Please acknowledge the source of the Data in any publications or presentations reporting use of this Data.
Citation:
Diane Tang, Mary Baker, stanford/gates, https://doi.org/10.15783/C7NC70 , Date: 20031016
Dataset Files
- final.anon.tar.gz (115.46 MB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
stanford-gates-readme.txt | 1.57 KB |
These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.
Questions about CRAWDAD? See our CRAWDAD FAQ. Interested in submitting your dataset to the CRAWDAD collection? Get started, by submitting an Open Access Dataset.