Datasets
Standard Dataset
Passive network measurements from real mobile devices
- Citation Author(s):
- Submitted by:
- Lucas Torrealba...
- Last updated:
- Fri, 01/03/2025 - 09:55
- DOI:
- 10.21227/9rbv-hm68
- Data Format:
- Research Article Link:
- License:
- Categories:
- Keywords:
Abstract
Along with the continuous growth of Internet usage, mobile users are becoming increasingly relevant as they are responsible for the largest percentage of web traffic. Conse- quently, a large and growing body of literature has been based on cellular data to gain a deeper understanding of several Internet-related concerns. Nevertheless, accessing high-quality cellular datasets can be a challenge for research teams due to scarcity and restricted access. To address such a critical issue, we present a novel measurement methodology for mobile devices capable of passively monitoring Internet traffic in user space, providing a comprehensive set of contextual information about the user and the network. Our proposed methodology is advantageous to collect inexpensive crowdsourced measurements from medium-sized sets of mobile users, and yet to obtain remarkable complementary insights on the target population. Indeed, we show that the outcomes of this methodology are comparable and consistent with large-scale studies about critical topics such as human mobility patterns, traffic consumption trends, adoption of networking protocols, and penetration of mobile network technologies. Further, the results of this study show the feasibility of conducting valuable research based on inexpensive network measurements from real mobile devices.
Table: pepa_ping (2020, 2021, 2023)
Stores information about all TCP and UDP connections collected during system runs using the PePa Ping methodology. The data includes destination IP address and port, network protocol, start and end times (with microsecond precision), bytes transmitted and received, and the package name of the Android application that established the connection. For TCP connections, additional data is collected from the Linux kernel's tcp_info
structure: average round-trip time (RTT), minimum RTT, RTT variance (jitter), and the number of lost packets. Each entry in the pepa_ping table references the environmental data associated with a specific 1-minute execution of the measurement system, linking network flows to the same time period.
- 1. Field: id
- Description: Connection identifier (primary key)
- Data type: integer
- 2. Field: dst_ip
- Description: Destination IP
- Data type: character varying
- 3. Field: dst_port
- Description: Port of destination
- Data type: integer
- 4. Field: protocol
- Description: Protocol used
- Data type: character varying
- 5. Field: start_time_sec
- Description: Connection start time in seconds
- Data type: double precision
- 6. Field: start_time_usec
- Description: Connection start time in microseconds
- Data type: double precision
- 7. Field: end_time_sec
- Description: Connection end time in seconds
- Data type: double precision
- 8. Field: end_time_usec
- Description: Connection end time in microseconds
- Data type: double precision
- 9. Field: tx_bytes
- Description: Bytes transmitted of the connection
- Data type: integer
- 10. Field: rx_bytes
- Description: Bytes received of the connection
- Data type: integer
- 11. Field: min_rtt
- Description: Minimum RTT of the connection
- Data type: double precision
- Notes: Only if protocol==tcp
- 12. Field: rtt
- Description:Round trip time of the connection
- Data type: double precision
- Notes: Only if protocol==tcp
- 13. Field: rtt_var
- Description: Variance of RTT of the connection
- Data type: double precision
- Notes: Only if protocol==tcp
- 14. Field: lost_packets
- Description: Lost packets during the connection
- Data type: double precision
- Notes: Only if protocol==tcp
- 15. Field: snd_cwnd
- Description: Sending congestion window
- Data type: integer
- Notes: Only if protocol==tcp
- 16. Field: snd_mss
- Description: Sending maximum segment size
- Data type: integer
- Notes: Only if protocol==tcp
- 17. Field: rcv_mss
- Description: Receiving maximum segment size
- Data type: integer
- Notes: Only if protocol==tcp
- 18. Field: package_name
- Description: Contains the name of the package associated with the connection
- Data type: character varying
- 19. Field: environmental_data
- Description: Foreign key to environmental_data table
- Data type: integer
Table: environmental_data (2020)
Contains constant information throughout the one-minute measurement: start timestamp and user device identifier. Other tables reference this table. The contextual information at the start and end of the minute is stored here, including wifi_frequency, wifi_rssi, mobile_rssi, network_type, and connection_type.
- 1. Field: environmental_data
- Description: Environmental data identifier (primary key)
- Data type: integer
- 2. Field:wifi_frequency_start
- Description:WiFi frequency at the start of data collection
- Data type:double precision
- 3. Field: wifi_rssi_start
- Description: WiFi Received Signal Strength Indicator at the start of data collection
- Data type: integer
- 4. Field: network_type_start
- Description: Network type at the start of data collection
- Data type: character varying
- 5. Field: connection_type_start
- Description: Connection type at the start of data collection
- Data type: character varying
- 6. Field: timestamp_start
- Description: Start timestamp associated with the environment
- Data type: timestamp without time zone
- 7. Field: wifi_frequency_end
- Description: WiFi frequency at the end of data collection
- Data type: double precision
- 8. Field: wifi_rssi_end
- Description: WiFi Received Signal Strength Indicator at the end of data collection
- Data type: integer
- 9. Field: mobile_rssi_end
- Description: Mobile Received Signal Strength Indicator at the end of data collection
- Data type: integer
- 10. Field: network_type_end
- Description: Network type at the end of data collection
- Data type: character varying
- 11. Field: connection_type_end
- Description: Connection type at the end of data collection
- Data type: character varying
- 12. Field: timestamp_end
- Description: End timestamp associated with the environment
- Data type: timestamp without time zone
- 13. Field: app_version
- Description: Version of the application
- Data type: Version of the application
- 14. Field: uuid
- Description: Identifier for the user
- Data type: character varying
Table: server_information (2020)
Stores information about the entity managing a specific IP address, obtained from TLS/SSL certificate metadata and reverse DNS resolutions.
- 1. Field:ip
- Description: IPv4 address
- Data type: character varying(15)
- 2. Field:common_name
- Description:common_name in TLS/SSL certificate or dns_reverse associated to the IP
- Data type:character varying(100)
- 3. Field: organization
- Description: organization in TLS/SSL certificate
- Data type: character varying(100)
Table: server_information (2021)
Stores information about the entity managing a specific IP address, obtained from TLS/SSL certificate metadata and reverse DNS resolutions.
- 1. Field:ip
- Description: IPv4 address
- Data type: character varying(15)
- 2. Field:common_name
- Description:common_name in TLS/SSL certificate or dns_reverse associated to the IP
- Data type:character varying(100)
- 3. Field: organization
- Description: organization in TLS/SSL certificate
- Data type: character varying(100)
- 4. Field: source
- Description: corresponds to the form from which the information was extracted dns_reverse or SSL/TLS
- Data type: character varying(20)
- 5. Field: date
- Description: date the information was obtained
- Data type: date
Table: server_information (2023)
Stores information about the entity managing a specific IP address, obtained from TLS/SSL certificate metadata and reverse DNS resolutions.
- 1. Field:ip
- Description: IPv4 address
- Data type: character varying(15)
- 2. Field:common_name
- Description:common_name in TLS/SSL certificate or dns_reverse associated to the IP
- Data type:character varying(100)
- 3. Field: organization
- Description: organization in TLS/SSL certificate
- Data type: character varying(100)
- 4. Field: date
- Description: date the information was obtained
- Data type: date
Table: connection (2021, 2023)
Records Internet connectivity changes between Wi-Fi and Mobile Data: measurement timestamp and connection type.
- 1. Field: id
- Description: Foreign key to environmental_data table
- Data type: integer
- 2. Field: timestamp
- Description: The time when the associated information is recorded
- Data type: timestamp without time zone
- 3. Field: connection_type
- Description: Indicates the type of connection (Mobile, Wifi, Unknown)
- Data type: character varying
Table: cellular (2021, 2023)
Contains information about cellular network quality during the one-minute execution: measurement timestamp, cell type, network technology, and various signal quality indicators.
- 1. Field: id
- Description: Foreign key to environmental_data table
- Data type: integer
- 2. Field: timestamp
- Description: The time when that data was collected
- Data type: timestamp without time zone
- 3. Field: rssi
- Description: Received Signal Strength Indicator
- Data type: integer
- 4. Field: rsrp
- Description: Reference Signals Received Power
- Data type: integer
- Notes: Only if cell_type==LTE
- 5. Field: rscp
- Description: Received Signal Code Power
- Data type: integer
- Notes: Only if cell_type==WCDMA
- 6. Field: level
- Description: Abstract level value for the overall signal strength
- Data type: integer
- Notes: Corresponding to an integer between 0 and 4
- 7. Field: cqi
- Description: Channel Quality Indicator
- Data type: integer
- Notes: Only if cell_type==LTE
- 8. Field: rsrq
- Description: Reference Signal Received Quality
- Data type: integer
- Notes: Only if cell_type==LTE
- 9. Field: rssnr
- Description: Reference Signal Signal To Noise Ratio
- Data type: integer
- Notes: Only if cell_type==LTE
- 10. Field: bit_error_rate
- Description: Bit error rate
- Data type: integer
- Notes: Only if cell_type==WCDMA or cell_type==GMS
- 11. Field: timing_advance
- Description: Timing advance
- Data type: integer
- Notes: Only if cell_type==LTE
- 12. Field: antenna
- Description: Antenna identifier
- Data type: integer
- Notes: Anonymized; if antenna is unknown, the antenna identifier is 0
- 13. Field: cell_type
- Description: Information regarding the type of cell in which it is connected
- Data type: character varying(10)
- 14. Field: network_type
- Description: information regarding the type of network in which it is connected
- Data type: character varying(10)
Table: WiFi (2020, 2021, 2023)
Provides information about Wi-Fi quality during the one-minute execution: measurement timestamp, Received Signal Strength Indicator, and Wi-Fi frequency.
- 1. Field: id
- Description: Foreign key to environmental_data table
- Data type: integer
- 2. Field: timestamp
- Description: The time of the data in the row corresponds
- Data type: timestamp without time zone
- 3. Field: frequency
- Description: WiFi frequency
- Data type: double precision
- 4. Field: rssi
- Description: Received Signal Strength Indicator of the Wifi
- Data type: integer
Dataset Files
- data_magazine.zip (465.05 MB)
- quic_traffic_group_by_organization.csv (500 bytes)