CRAWDAD ibm/watson

Citation Author(s):
Magdalena
Balazinska
University of Washington
Paul
Castro
IBM T.J. Watson Research Center
Submitted by:
CRAWDAD Team
Last updated:
Thu, 11/09/2006 - 08:00
DOI:
10.15783/C7RG6K
Data Format:
License:
157 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

This dataset includes SNMP records for a corporate research center (IBM Watson research center) over several weeks.

date/time of measurement start: 2002-07-20

date/time of measurement end: 2002-08-18

collection environment: The 802.11b wireless local-area network that we studied is spread throughout three large corporate buildings hosting computer science and electrical engineering research groups. The largest of the buildings, which we call LBldg, has 131 access points and is approximately 10 miles away from the other buildings. The other buildings, MBldg and SBldg, are adjacent to each other. They have 36 and 10 access points respectively. The placement of access points in buildings is based on geometry (one access points per corridor, for instance). Extra access points are placed in a few highly used rooms, such as a customer laboratory in SBldg.

network configuration: The network is configured to run in infrastructure mode, in which wireless clients connect to the wired network through access points distributed in the environment. All 177 access points are Cisco Aironet 350s. We observed a total of 1366 unique MAC addresses. Laptops were by far the predominant devices on the network. We do not have information whether any other types of devices were used at all. We assume that each unique MAC address corresponds to a user, even though it is possible for a single user to have more than one MAC address or for users to trade cards with each other.

data collection methodology: We used SNMP to poll access points every 5 minutes, from Saturday, July 20th 2002 through Sunday, August 17th 2002.

sanitization: Users were not informed that the study was performed. The only sensitive information that we gathered were the MAC and IP addresses of network cards, as well as the names assigned to access points. To ensure user privacy, we anonymized all three types of information.

Traceset

ibm/watson/snmp

This traceset includes SNMP records collected by polling APs every 5 minutes in a corporate research center (IBM Watson research center) over several weeks

  • file: anon-data.tar.gz
  • methodology: We used SNMP to poll access points every 5 minutes, from Saturday, July 20th 2002 through Sunday, August 17th 2002. We chose 5 min intervals to ensure that our study would not affect access point performance. We collected information about the traffic going through each access point as well as about the list of users associated with each access point. For each user, we retrieved detailed information on the amount of data (bytes and packets) transferred, the error rates, the latest signal strength, and the latest signal quality. We polled all access points except three located in MBldg that did not respond to SNMP requests.
  • sanitization: Site names have been anonymized into LBldg, MBldg, and SBldg. - Access point names have been anonymized by computing the SHA-1 hash of their name (concatenated with a secret) and pre-pending the anonymized site name to it.    - MAc addresses and IP addresses have been anonymized by computing the SHA-1 of their values (concatenated with a secret)
  • hole: Due to a power failure, there is a one-hour hole in the data (07/30/2002 from 1pm to 2pm). For unknown reasons, we also have a few holes in the data gathered at a few of the access points during the evening and night of 08/08/2002. Due to periods where access points were heavily loaded, some sample intervals stretch to 10 min.
  • note: The data in the following directories is organized by days.  There is one directory for every day of the trace. There are three files(traces) for each access point and for each day.  File names start with the name of the access point. The suffix of  file names indicates the type of information it contains. On every poll of an access point, we appended data to each of these three files: 

- (ibm_corporate-snmp-ap) File name ending with .snmp = table with data on the access point 
- (ibm_corporate-snmp-interfaces) File name ending with -interfaces.snmp = table with data on the  access point's wireless interface
- (ibm_corporate-snmp-users) File name ending with -users.snmp = table with data on users

ibm/watson/snmp traces

  • ap: SNMP records on APs (MIB-II). This trace includes SNMP records about AP information such as number of inbound/outbound packets.
    • configuration: SNMP polling on each access point at every 5 minutes
    • format: Each trace consists of 14 fields as follows:

1. site        (string, anonymized)

2. day         (date )

3. moment      (time)

4. name        (string, anonymized)

5. sysUpTime   (time)

6. snmpInPkts  (int unsigned )

7. snmpOutPkts (int unsigned )

8. ipIn        (int unsigned )

9. ipOut       (int unsigned )

10. ipFwd     (int unsigned )

11. tcpIn       (int unsigned )

12. tcpOut      (int unsigned )

13. udpIn       (int unsigned )

14. udpOut      (int unsigned )

The following is the description of each field:
- site: the building where the access point was located
- day and moment: timestamp of the poll
- name: anonymized access point name

From the standard MIB-II (RFC1213): Management  Information Base for Network Management of TCP/IP-based Internets, we collected the following information for each access point:

- sysUpTime: The time (in hundredths of a second) since the network management portion of the system was last re-initialized.
- snmpInPkts: The total number of Messages delivered to the SNMP entity from the transport service.
- snmpOutPkts: The total number of SNMP Messages which were passed from the SNMP protocol entity to the transport service.
- ipInReceives: The total number of input datagrams received from interfaces, including those received in error.
- ipOutRequests: The total number of IP datagrams which local IP user-protocols (including ICMP) supplied to IP in requests for transmission.  Note that this counter does not include any datagrams counted in ipForwDatagrams.
- ipForwDatagrams: The number of input datagrams for which this entity was not their final IP destination, as a result of which an attempt was made to find a route to forward them to that final destination.  In entities which do not act as IP Gateways, this counter will include only those packets which were Source-Routed via this entity, and the Source- Route option processing was successful.
- tcpInSegs: The total number of segments received, including those received in error. This count includes segments received on currently established connections.
- tcpOutSegs: The total number of segments sent, including those on current connections but excluding those containing only retransmitted octets.
- udpInDatagrams: The total number of UDP datagrams delivered to UDP users.
- udpOutDatagrams: The total number of UDP datagrams sent from this entity.

  • interfaces: SNMP records on APs (MIB-II). This trace includes SNMP records about AP network interface such as bytes of inbound/outbound traffic, number of errors, and number of discarded packets
    • configuration: SNMP polling on each access point's wireless interface at every 5 minutes
    • format: Each trace consists of 16 fields as follows:

1. site           (string, anonymized)

2. day            (date             )

3. moment      (time            )

4. name           (string           , anonymized)

5. ifIndex        (int              )

6. ifType         (string          )

7. ifSpeed        (int unsigned     )

8. ifPhysAddress  (string           , anonymized)

9. ifInOct        (int unsigned     )

10. ifInUcastPkts  (int unsigned )

11. ifInErrors     (int unsigned     )

12. ifInDiscards  (int unsigned     )

13. ifOutOct       (int unsigned     )

14. ifOutUcastPkts (int unsigned  )

15. ifOutErrors    (int unsigned     )

16. ifOutDiscards  (int unsigned    ) 

The following is the description of each field.
- site: building where access point is located
- day + moment : timestamp of poll
- name: name of access point
- ifIndex and ifType: interface index and type (to recognize the wireless interface)
From the standard MIB-II (RFC1213~\cite{rfc1213}: Management Information Base for Network Management of TCP/IP-based Internets, we collected the following information for each access point's wirelessinterface:
- ifSpeed: An estimate of the interface's current bandwidth in bits per second.  For interfaces which do not vary in bandwidth or for those where no accurate estimation can be made, this object should contain the nominal bandwidth.
- ifPhysAddress: The interface's address at the protocol layer immediately `below' the network layer in the protocol stack.  For interfaces which do not have such an address (e.g., a serial line), this object should contain an octet string of zero length.
- ifInOctets: The total number of octets received on the interface, including framing characters.
- ifInUcastPkts: The number of subnetwork-unicast packets delivered to a higher-layer protocol.
- ifInErrors: The number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol.
- ifInDiscards: The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol.  One possible reason for discarding such a packet could be to free up buffer space.
- ifOutOctets: The total number of octets transmitted out of the interface, including framing characters.
- ifOutUcastPkts: The total number of packets that higher-level protocols requested be transmitted to a subnetwork-unicast address, including those that were discarded or not sent.
- ifOutErrors: The number of outbound packets that could not be transmitted because of errors.
- ifOutDiscards: The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being transmitted.  One possible reason for discarding such a packet could be to free up buffer space.

  • users: SNMP records on users' network interface. This trace includes SNMP records about network users such as number of packets and bytes from/to each user's machine.
    • configuration: SNMP polling on each access point's associated users at every 5 minutes
    • format: Each trace contains the following 22 fields:

1. site           ( string, anonymized)

2. day            ( date)

3. moment         ( time )

4. parent         ( string, anonymized)

5. aid            ( int unsigned )

6. state          ( string, only "associated" users recorded)

7. shortRet       ( int unsigned )

8. longRet        ( int unsigned )

9. strength       ( int )

10. quality        ( int)

11. mac            ( string, anonymized)

12. classID       ( string, only "clientStations" recorded)

13. srcPkts        ( int unsigned )

14. srcOct         ( int unsigned )

15. srcErrPkts    ( int unsigned )

16. srcErrOct      ( int unsigned )

17. dstPkts        ( int unsigned )

18. dstOct         ( int unsigned )

19. dstErrPkts   ( int unsigned )

20. dstErrOct      ( int unsigned )

21. dstMaxRetryErr (int unsigned )

22. ip             ( string, anonymized)

- site: building where access point is located
- day + moment : timestamp of poll
- parent: name of access point

From the Cisco Aironet Access Point MIB (AWCVX-MIB.my) we collected
information about users:

- awcDot11TpFdbAID (aid): AID with which the Station is associated with this system, or 2008 if the Station is not currently known to be associated.  If the entry is multicast, awcDot11TpFdbAID is 0.  Note that the uplink from a Client or Repeater 
AP to its parent is always AID 1.
- awcDot11TpFdbClientState: 802.11 Service State of the Station. The state can be one of the following: state0 (station not able to send any frames whatsoever. It is most likely not yet configured), state1 (station can send Class 1 frames.  It is Unauthenticated and Unassociated), state2 (Station can send Class 2 frames. It is Authenticated, but is as yet Unassociated), and state3 (Station can send Class 3 frames. It is both Authenticated and Associated).
- awcDot11TpFdbTxShortRetries: The total number of 802.11 Short Retries (RTS retries) incurred across all packet Transmission Attempts to this Station.
- awcDot11TpFdbTxLongRetries: The total number of 802.11 Long Retries (data retries) incurred across all packet Transmission Attempts to this Station.
- awcDot11TpFdbLatestRxSignalStrength: A device-dependent measure of the signal strength of the most recently received packet from this Station.  Might be normalized or unnormalized.
- awcDot11TpFdbLatestRxSignalQuality: A device-dependent measure of the signal quality of the most recently received packet from this Station.
- awcDot11TpFdbAddress (mac): MAC address
- awcTpFdbSrcPktsImmed: Number of observed packets for which this station was the source.
- awcTpFdbSrcOctetsImmed: Number of observed octets for which this station was the source.
- awcTpFdbSrcErrorPktsImmed: Number of observed error packets for which this station was the source.
- awcTpFdbSrcErrorOctetsImmed: Number of observed error octets for which this entry was the source.
- awcTpFdbDestPktsImmed: Number of observed packets for which this station was the destination.
- awcTpFdbDestOctetsImmed: Number of observed octets for which this station was the destination.
- awcTpFdbDestErrorPktsImmed: Number of observed error packets for which this station was the destination. This count includes awcTpFdbDestMaxRetryErrorsImmed.
- awcTpFdbDestErrorOctetsImmed: Number of observed error octets for which this station was the destination.
- awcTpFdbDestMaxRetryErrorsImmed: Number of observed max-retry error packets for which this station was the destination.
- awcTpFdbIPv4Addr: IPv4 network address of the station.

Instructions: 

The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort. 

About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing. 

CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022. 

Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques. 

Please acknowledge the source of the Data in any publications or presentations reporting use of this Data. 

Citation:

Magdalena Balazinska, Paul Castro, ibm/watson, https://doi.org/10.15783/C7RG6K , Date: 20030219

Dataset Files

LOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in  users. Don't have a login?  Create a free IEEE account.  IEEE Membership is not required.

Documentation

AttachmentSize
File ibm-watson-readme.txt1.57 KB

These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.

Questions about CRAWDAD? See our CRAWDAD FAQ. Interested in submitting your dataset to the CRAWDAD collection? Get started, by submitting an Open Access Dataset.