Datasets
Open Access
CRAWDAD ctu/personal
- Citation Author(s):
- Submitted by:
- CRAWDAD Team
- Last updated:
- Thu, 03/15/2012 - 08:00
- DOI:
- 10.15783/C7059S
- Data Format:
- License:
- Categories:
- Keywords:
Abstract
This dataset contains 142 days of mobile phone records (aka Call Data Records) and ground-truth movement description of Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011.
last modified :
2012-03-15
release date :
2012-03-15
date/time of measurement start :
2010-08-16
date/time of measurement end :
2011-02-06
collection environment :
This dataset contains 142 days of mobile phone records (as known as Call Data Records) and cell transitions (a ground-truth movement description) of Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011. The dataset covers more than 99.99% of 142 days of mobile phone usage in mobile networks of 8 different providers in 5 countries: Czech Republic, Slovak Republic, Germany, Austria and the USA.
network configuration :
The phone was serviced mostly by Vodafone Czech Republic, the home network of the user, in the Czech Republic. Other network providers in countries abroad are as follows: Orange (Slovakia), A1 Telekom (Austria), T-Mobile Deutschland, Vodafone D2, O2 (Germany), and T-Mobile and AT&T (USA)
data collection methodology :
The source of the data is user's own mobile phone Nokia E52. The publicly available LogExport application was used to record time and type of communication events (voice, SMS, data). For cell-transition recording, the free CellTrack91 application was utilized. The coordinates of positions within the cells were obtained by translating the Cell-IDs to their geographical coordinates by querying the Google Location API, as described in our MASS paper.
sanitization :
The Cell Global Identity of a cell the mobile phone is attached to is only partially anonymized. Fields with original values are the Mobile Country Code (MCC) and the Mobile Network Code (MNC), to distinguish in which country a mobile phone was present, and which provider serviced it. The Location Area Code (LAC) and the Cell-ID are anonymized, in other words, renumbered according to the time of their first occurence in the dataset. Such approach does not limit the data usage but helps the mobile providers not to feel threatened by exposing the Cell-IDs together with the approximate geographical coordinates of the cell. This geographical information, the longitude/latitude coordinates of a cell, is not anonymized and thus represents a way to reconstruct a ground-truth movement trajectory of the mobile phone.
limitation :
The spatial accuracy of the data is typical for a cellular network. It depends on a cell size and thus varies from tens to hundred of meters in urban areas to several kilometers in rural areas.
Traceset
ctu/personal/mobile
Mobile phone records of Czech Ph.D. student Michal Ficek collected in 2010-2011.
- files: ficek_personal_communication.csv.gz, ficek_personal_movement.csv.gz
- description: This traceset contains 142 days of mobile phone records (aka Call Data Records) and ground-truth movement description of Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011.
- measurement purpose: User Mobility Characterization, Usage Characterization, Positioning Systems, Social Network Analysis, Human Behavior Modeling, Localization
- methodology: On a mobile phone Nokia E52 (firmware version 054.003) we run a publicly available application LogExport 1.1 UTC (http://tinyhack.com/freewarelist/s603rd/2007/03/02/logexport/) to record both time and type of communication events. For cell transitions recording the free CellTrack91 1.0.9 (http://www.afischer-online.de/sos/celltrack/) application was used. Every week during the measurement period the data from both applications were stored, and the cell coordinates were obtained from Google Location API. The mobile phone was allways carried by the dataset author.
- sanitization: The Cell Global Identity of a cell the mobile phone is attached to is only partially anonymized. The Location Area Code (LAC) and the Cell-ID are anonymized, in other words, renumbered according to the time of their first occurence in the dataset. The Mobile Country Code (MCC) and Mobile Network Code (MNC) remain intact, are not anonymized.
- last modified: 2012-03-15
- dataname: ctu/personal/mobile
- version: 20120315
- change: the initial version.
- release date: 2012-03-15
- date/time of measurement start: 2010-08-16
- date/time of measurement end: 2011-02-06
- limitation: The spatial accuracy of the data is typical for a cellular network. It depends on a cell size and thus varies from tens to hundred of meters in urban areas to several kilometers in rural areas.
- hole: There are only three gaps in the data when the cell-tracking application was turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from 05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off during the measurement period, except when on-board of a plane and airborne.
- error: The positions within the cells were obtained by querying the Google Location API. In our MASS paper, we showed, by comparing with data obtained from a large and cooperating mobile network provider, that the accuracy of such approach is nearing the cellular network operator's own approximation of position inside a cell.
ctu/personal/mobile Traces
- 2010: Mobile phone records of Czech Ph.D. student Michal Ficek collected in 2010-2011.
- configuration: We used the application LogExport 1.1 running on a mobile phone Nokia E52 (fw 054.003).
- format: The communications' trace, ficek_personal_communication.csv, consists of timestamped records for every voice, text message and data communication, either outgoing, or incoming. The movement trace, ficek_personal_movement.csv, contains a timestamped list with full Cell Global Identity of a cell the phone was attached to (Mobile Country Code, Mobile Network Code, Location Area Code, and Cell-ID), and the approximate geographical coordinates of the corresponding cell tower (longitude, latitude) in non-anonymized form. Each file has 1 header row. ficek_personal_communication.csv contains the following fields. Fields 1-5: "YYYYMMDD","hhmmss (UTC+0)","Type","Direction","Duration". - The time field "hhmmss" represents the GMT time. - Type of communication is either "Voice", "SMS" or "Data". - Communication direction in the "Direction" field is either "Outgoing" (call made, SMS sent, Data session started), "Incoming" (call or SMS received), or "Missed call". - "Duration" field stores the duration in seconds of a call or a data session. ficek_personal_movement.csv contains fields "YYYYMMDD","hhmmss (UTC+0)","MCC", "MNC","LAC","CID","Latitude","Longitude","Timezone". - The time field "hhmmss" represents the GMT time. The other fields are self-explanatory. ("MCC" stands for the Mobile Country Code, "MNC" for the Mobile Network Code, "LAC" for the Location Area Code, "CID" for the Cell-ID.) To get the local time, the "Timezone" field must be added to the UTC time. The timezone field already contains the daylight saving time (DST) adjustment. If MCC=0 and MNC=0, the mobile phone is at a place without signal coverage. If Latitude and Longitude fields equal zero, the coordinates for the corresponding cell are unknown.
- description: This trace covers 142 days of mobile phone usage by Czech Ph.D. student Michal Ficek, stored by his own mobile terminal in 2010-2011
- last modified: 2012-03-15
- dataname: ctu/personal/mobile/2010
- version: 20120315
- change: the initial version
- release date: 2012-03-15
- date/time of measurement start: 2010-08-16
- date/time of measurement end: 2011-02-06
- hole: There are only three gaps in the data when the cell-tracking application was turned off by accident: from 02-Oct-2010 22:42:06 to 03-Oct-2010 07:58:04, from 05-Oct-2010 15:08:42 to 05-Oct-2010 15:22:42, and from 09-Oct-2010 13:40:18 to 09-Oct-2010 15:49:32. Otherwise, the mobile phone had never been switched off during the measurement period, except when on-board of a plane and airborne.
- limitation: The spatial accuracy of the data is typical for a cellular network. It depends on a cell size and thus varies from tens to hundred of meters in urban areas to several kilometers in rural areas. We are aware of two situations where the geographical coordinates of cells in the data do not correspond to their actual coordinates. 1) Due to the nature of cell-retrieving method, the coordinates of about 13 cells (out of approx. 3700 cells) were not found by the Google Location API and thus are missing in the trace. Such records have the MCC, MNC, LAC and CID fields filled, but their Longitude and Latitude fields are set to zero. 2) For a specific reason, all cells that cover different subway stations in Prague, the capital of the Czech Republic, share the same geographical coordinates (50.074297, 14.428297). However, they are in fact distributed all around the Prague.
- sanitization: The cell numbers of parties communicating with the mobile phone are not present. The Cell Global Identity of a cell the mobile phone is attached to is partially anonymized. The Mobile Country Code (MCC) and Mobile Network Code (MNC) remain intact, are not anonymized.
- error: The positions within the cells were obtained by querying the Google Location API. In our MASS paper, we showed, by comparing with data obtained from a large and cooperating mobile network provider, that the accuracy of such approach is nearing the cellular network operator's own approximation of position inside a cell.
The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort.
About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing.
CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022.
Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques.
Please acknowledge the source of the Data in any publications or presentations reporting use of this Data.
Citation:
Michal Ficek, ctu/personal, https://doi.org/10.15783/C7059S , Date: 20120315
Dataset Files
- ficek_personal_communication.csv.gz (23.81 kB)
- ficek_personal_movement.csv.gz (318.12 kB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
ctu-personal-readme.txt | 1.55 KB |
These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.
Questions about CRAWDAD? See our CRAWDAD FAQ. Interested in submitting your dataset to the CRAWDAD collection? Get started, by submitting an Open Access Dataset.