Datasets
Open Access
CRAWDAD strath/nodobo
- Citation Author(s):
- Submitted by:
- CRAWDAD Team
- Last updated:
- Tue, 07/05/2011 - 08:00
- DOI:
- 10.15783/C7HP4Q
- Data Format:
- License:
- Collection:
- CRAWDAD
- Categories:
- Keywords:
Abstract
Dataset of mobile phone usage records collected with Nodobo suite at the University of Strathclyde.
Dataset gathered by Nodobo, a suite of social sensor software for Android phones, during a study of the mobile phone usage at University of Strathclyde.
date/time of measurement start: 2010-09-09
date/time of measurement end: 2011-02-23
collection environment: Our researchers developed "Nodobo", a set of software extensions to the Google Android operating system, for enabling the capture and replay of smartphone user interactions sessions. The software captures a variety of social context data, including logs of phone calls, text messages, Bluetooth proximity detection, WiFi access point, and cell tower ID. The directionality of calls and text messages are recorded, along with the associated phone number, and the duration of the call or length of the message. Bluetooth proximity is detected every minute, and includes all devices in the study as well as any other clients which respond to service discovery. Basic positioning is achieved through WiFi hotspot and cell tower ID records.
network configuration: Each of the study participants was given a Google Nexus One smartphone, prepared with a modified Android operating system. Data is stored in a simple database on the device SD card, which is then synchronised over the air to a central server.
data collection methodology: The dataset was collected through monitoring devices of 27 users over a 5-month study.
sanitization: Record fields containing personally identifiable information have been anonymised.
Nodobo-2011-01-v1 is the traceset gathered by Nodobo software at University
of Strathclyde from September 2010 to February 2011.
Traceset
strath/nodobo/mobile
Traceset of mobile phone usage records collected with Nodobo suite at the University of Strathclyde.
- file: nodobo-release.tar.gz, nodobo-csv.zip
- description: Nodobo-2011-01-v1 is the traceset gathered by Nodobo software at University of Strathclyde from September 2010 to February 2011.
- measurement purpose: Usage Characterization, Social Network Analysis
- methodology: A group of 27 promising high school students in a Scottish state high school were selected for this study. All students previously had a mobile phone, with approximately 1/3 of these falling in the category of smartphone (iPhone, Blackberry, or similarly powerful handset). Each of the study participants was given a Google Nexus One smartphone, prepared with a modified Android operating system.
The close proximity of the deployment to University of Strathclyde enables the study organisers to schedule regular visits to diagnose issues, as well as facilitating regular backups to be made. To maintain as up-to-date a dataset as possible, and to limit the number of visits required, the devices also synchronise with a web server over the mobile network or WiFi.
- sanitization: Record fields containing personally identifiable information have been anonymised.
trath/nodobo/mobile Trace
- social: Mobile phone usage records collected with Nodobo suite at University of Strathclyde in 2010-2011.
- configuration: 27 Google Nexus One smartphones were prepared with a modified Android operating system, running Nodobo. The phone database is synchronised periodically over-the-air with a web services data store.
- format: db.sqlite3.dump.bz2 is a bzipped SQL dump of the sqlite3 database. You can recreate the database by doing the following:
bzcat db.sqlite3.dump.bz2 | sqlite3 db.sqlite3
# Database schema
The following tables are used:
## Calls and Messages
* other_id: id of the other user on the call (NULL if not in the study)
* number: phone number of the other end of the call/message (related:
Users#number)
* duration: length of the call in seconds
* length: number of characters in the message
## CellTowers
* cellid: GSM base transceiver station CID
* lac: location area code
## Devices
* imei: blank for this release of the data
* mac: Bluetooth MAC (related: Presences#mac)
## Presences
* other_id: user_id of the detected device (NULL if not in the study)
* mac: Bluetooth MAC (related: Devices#mac)
* bluetooth_class: reported class of the device
* name: human-readable name of the device
## Users
* name: "Anonymous" for this release of the data
* number: phone number of the study user (related: Calls#number,
Messages#number)
## Wifis
* ssid: human-readable name of the base station
* bssid: base station MAC
## All tables
* The database schema follows ActiveRecord conventions: tables are plurals,
foreign keys are singular_id, each table has an id primary key and
created_at/updated_at timestamps.
* user_id is used to indicate which user recorded the interaction.
* Calls and messages tables have two timestamp columns. The
call_timestamp/message_timestamp is the one recorded by the phone when the
call/message was originally recorded. The timestamp column in the time at
which the calldb/smsdb synchronisation occurred (which is less useful).
* Some tables have an "interaction" column. This was used for database
synchronising and is left in for internal debugging purposes.
# Software and studies
Also included in the dataset download are programs for three sample studies. These are detailed below.
Each program can be run with ruby: for example, "ruby conversation-length.rb". The programs assume that your current working directory is the one with the database and the nodobo.rb code.
Software used:
* Ruby 1.8.7 or later, with gems: activerecord, sqlite3-ruby, progressbar
* gnuplot 4.4
* GraphViz 2.22
## Ruby interface: nodobo.rb
We have supplied a simple ActiveRecord interface to the database, "nodobo.rb". This gives classes and relations for each of the types of data in the dataset.
The interface can be used by running "irb -r ./nodobo.rb", or by using "require 'nodobo'" in your own programs. A sample irb session is given below:
>> u = User.find(19)
=> #
>> u.calls.size
=> 976
>> study_calls = u.calls.select {|c| c.other != nil }; study_calls.size
=> 133
>> Hash[study_calls.group_by(&:other_id).map {|k,v| [k, v.size]}]
=> {16=>2, 19=>1, 25=>2, 14=>4, 21=>124}
>> v = User.find(21)
=> #
>> v.calls.select {|c| c.other != nil }.size
=> 175
- sanitization: The following fields have been altered to remove personal information from the
dataset:
* Call#number, Message#number, User#number
* Device#mac, Presence#mac
* Wifi#bssid
* Presence#name
* Wifi#ssid
* CellTower#cellid
* CellTower#lac
Each real value for these fields maps 1:1 to a randomly-generated anonymous
value. The process for generating these values is as follows:
* Phone number: random number with the same number of digits; if original
number is 3 or more digits, keep the original first 2 digits
* MAC address: 12 random hex digits
* Bluetooth name/Wifi ssid: random sequence of dictionary words, same number
of words as original name
* Cell ID and LAC: random number with the same number of digits
The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort.
About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing.
CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022.
Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques.
Please acknowledge the source of the Data in any publications or presentations reporting use of this Data.
Citation:
Alisdair McDiarmid, James Irvine, Stephen Bell, Jamie Banford, strath/nodobo, https://doi.org/10.15783/C7HP4Q , Date: 20110323
Copyright (c) 2011 University of Strathclyde
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Dataset Files
- nodobo-release.tar.gz (41.47 MB)
- presences.csv (482.01 MB)
- messages.csv (5.29 MB)
- calls.csv (840.20 kB)
- nodobo-csv.zip (32.42 MB)
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
Documentation
Attachment | Size |
---|---|
strath-nodobo-readme.txt | 1.6 KB |
These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.
Questions about CRAWDAD? See our CRAWDAD FAQ. Interested in submitting your dataset to the CRAWDAD collection? Get started, by submitting an Open Access Dataset.