Datasets
Open Access
CRAWDAD icsi/netalyzr-android
- Citation Author(s):
- Submitted by:
- CRAWDAD Team
- Last updated:
- Tue, 03/24/2015 - 08:00
- DOI:
- 10.15783/C7MS39
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
Mobile data collected using the Netalyzr for Android App.
This dataset was collected by the ICSI Netalyzr app for Android to develop a characterization of how operational decisions, such as network configurations, business models, and relationships between operators introduce diversity in service quality and affect user security and privacy. We delve in detail beyond the radio link and into network configuration and business relationships in six countries. We identify the widespread use of transparent middleboxes such as HTTP and DNS proxies, analyzing how they actively modify user traffic, compromise user privacy, and potentially undermine user security. In addition, we identify network sharing agreements between operators, highlighting the implications of roaming and characterizing the properties of MVNOs, including that a majority are simply rebranded versions of major operators. More broadly, our findings using this data highlight the importance of considering higher-layer relationships when seeking to analyze mobile traffic in a sound fashion.
date/time of measurement start: 2013-10-22
date/time of measurement end: 2014-09-01
collection environment: This dataset is collected using the Netalyzr for Android app. This app is available for free from the Google Play website for anyone to install and run. We analyzed data for a 9 month period from six countries: US, CA, UK, FR, DE, and AU.
network configuration: Android phones connected through 3G and 4G networks. Rooted and unrooted devices, and multi-user/multi-device.
data collection methodology: The data was collected by crowd-sourcing means. Users run proactively Netalyzr for Android App to troubleshoot their network configuration or understand their network and how it behaves. No private data is collected without user's consent.
Google Play link to the app: https://play.google.com/store/apps/details?id=edu.berkeley.icsi.netalyzr...
sanitization: The dataset contains exclusively the sessions used for the core of the paper (MNO and MVNO characterization in the USA, Canada, France, Germany, Great Britain and Australia). We excluded users connected through VPNs, users with public IP addresses, users on international roaming, users connected through femtocells, users with customized network configurations (e.g. custom HTTP proxies and DNS resolvers), and sessions coming from engineering mode networks according to ITU. For the remaining Netalyzr sessions, we excluded sensitive fields such as passwords/usernames of APN settings, location information, base station information, and sensitive information injected on HTTP headers by proxies. IPv4 and IPv6 addresses are anonymized by performing a /16 and /32 sub-netting respectively. FQDNs are not also released as they contain information that can identify the users in many cases. For accessing a larger public Netalyzr dataset with more detailed values and all the collected variables, visit PREDICT: https://www.predict.org
hole: Operator name, MCC/MNC values, as well as extra carrier information can be noisy or missing as a consequence of sessions generated by MVNO subscribers, network sharing agreements between operators, or even due to inconsistencies on Android's API (the dataset comprises handsets running versions from 2.2.3 to 5, which may also be modified by the vendor/mobile provider in their subsidized phones) or inaccurate APN settings on the handset (e.g. sometimes Android returns an empty MCC/MNC or an empty operator name). These sessions can be reconstructed. Other errors may appear on Netalyzr-specific tests (e.g. proxy detection and behavior characterization) due to connectivity problems or peculiar handset configurations. Our Mobisys'15 paper "Beyond the Radio: Illuminating the Higher Layers of Mobile Networks" contains further details about the data sanytization process, and the method followed for the study.
error: It is a dataset collected through crowd-sourcing means. Caution is advised at the time of interpreting the data.
limitation: Due to technical limitations, we cannot release an app for iOS, so this data is limited to Android users.
note: Do not hesitate to contact us on netalyzr-help@icsi.berkeley.edu for questions.
Traceset
icsi/netalyzr-android/middleboxes
Details of middlebox behavior in cellular networks. The traceset contains a subset of the data collected from the Netalyzr for Android App.
- measurement purpose: Network Diagnosis, Network Performance Analysis
IP Addressing: Netalyzr identifies the client's local IP address via Android's APIs and system properties, and uses TCP connections and UDP flows to our echo servers to identify the public IP address of the device. We use the whois tool to identify the organization owning the IP address.
Cellular Network Provider Identification: To identify the network service operator we use Android's TelephonyManager and ConnectivityManager APIs, and extract the APN settings as reported by the handset. This allows us to identify the name of the mobile operator, the name of the operator as reported by the SIM card, the APN providing the service, the cell ID (where users allow it), the 3GPP standard providing the service, as well as the MNC and MCC parameters.
Location: Android allows us to extract city-level device location if the user allows it. This information is useful to identify where roaming happens between mobile operators, and identify locations with poor network performance.
HTTP proxies.
Non-responsive server test: TCP-terminating proxies may be deployed in cellular networks for performance improvement. Such proxies are likely to respond with a SYN-ACK to a client's connection request before connecting to the intended origin server. We test for this behavior by attempting a connection to a server that replies with a RST. If the Netalyzr client's attempt to connect to this server on port 80 initially succeeds, this indicates the presence of a TCP-terminating proxy.
Header modification test: RFC 2616 specifies that systems should treat HTTP header names as case-insensitive, and, with few exceptions, free of ordering requirements. Furthermore, RFC 2615 indicates that any proxy must add the Via header to indicate its presence to intermediate protocols and recipients. Netalyzr fetches custom content from our server using mixed-cased request and response headers in a known order. Any changes indicate the presence a proxy. This method also allows identifying additional headers added by the HTTP proxy, as in the case of tracking headers, and whether intermediate proxies modify traffic using techniques such as image transcoding, which can affect the fidelity of content delivered to mobile clients through CDNs and other cloud infrastructure.
HTTP enforcement test: In addition to standard HTTP, Netalyzr attempts to fetch an entity using the protocol declaration ICSI/1.1 instead of HTTP/1.1. If this request is rejected, we know that the network has a protocol-parsing proxy.
Invalid Host header value test. CERT VU 435052 describes how some in-path proxies would interpret the Host request header and attempt to contact the listed host rather than forward the request to the intended address. We check for this vulnerability by fetching from our server with an alternate Host header of www.google.com. The presence of this vulnerability in commercial proxies is alarming as it suggests that operators may not have their middlebox software upgraded, potentially having other vulnerabilities not covered by our test suite.
icsi/netalyzr-android/middleboxes Trace
- middleboxes-trace: The data exposing middlebox (HTTP and DNS) behaviour in cellular networks
- configuration: Crowdsourced data collection using Netalyzr for Android app
- format: The tuple (id,time,raw_op_name,clean_op_name,country,raw_cellular_technology,3gpp_family,mcc,mnc,apn,apn_name,extra_carrier_info,global_ip,ip_dns,ip_dns_proxy,ip_http_proxy,http_content_change,http_hdr_reorder,http_hdr_injection,invalid_host_name_vulnerability,http_enforcement,http_default_compression,transcoding,dns_direct_mangled,dns_direct_proxy,dns_direct_changed_id,roaming_indicator,rooted,http_header_injected_list
- id - integer
- time - timestamp
- raw_op_name - operator name as reported by Android's Telephony Manager
- clean_op_name - operator name after applying our filter
- country - device country as reported by android
- raw_cellular_technology - 3GPP technology as reported by Android Connectivity/Telephony Manager
- 3gpp_family - 3GPP family after applying our filter (i.e. UMTS/HSPA, LTE, CDMA)
- mcc - Mobile Country Code. Asigned by ITU. Identifies the country. As reported by Android's Telephony Manager
- mnc - Mobile Network Code. Asigned by ITU. Identifies the operator (generally the radio operator). As reported by Android Telephony Manager
- apn - APN information (not all android devices return a value)
- apn_name - APN Name (not all android devices return a value)
- extra_carrier_info - Optionally supplied extra information about the
- network state. Provided by Android Connectivity Manager
- global_ip - Public IP address (/16 for IPv4 and /64 for IPv6)
- ip_dns - IP address of the default DNS Resolver (as seen by Netalyzr)
- ip_dns_proxy - Address of a DNS proxy ( as seen from Netalyzr server).
- ip_http_proxy - IP address of the proxy in network ( as seen from Netalyzr server).
- http_content_change - HTTP Content has been modified. Boolean not as reported by Android
- rooted - Whether the phone is rooted or not (allows executing "su"). Security vulnerability.
- http_header_injected_list - List of HTTP headers injected by the proxy.
The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort.
About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing.
CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022.
Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques.
Please acknowledge the source of the Data in any publications or presentations reporting use of this Data.
Citation:
Narseo Vallina-Rodriguez, Srikanth Sundaresan, Christian Kreibich, Nicholas Weaver, Vern Paxson, icsi/netalyzr-android, https://doi.org/10.15783/C7MS39 , Date: 20150324
Dataset Files
- middleboxes-trace.csv.gz (61.08 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
icsi-netalyzr-android-readme.txt | 1.64 KB |
These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.
Questions about CRAWDAD? See our CRAWDAD FAQ. Interested in submitting your dataset to the CRAWDAD collection? Get started, by submitting an Open Access Dataset.