Dataset of contact patterns among students, collected during the Spring semester of 2006 in National University of Singapore.
The authors obtained the contact patterns among 22341 students, which were inferred from the information on class schedules and class rosters for the Spring semester of 2006 in National University of Singapore.
date/time of measurement start: 2006-01-09
date/time of measurement end: 2006-05-06
collection environment: We collected information on class schedules and class rosters for the Spring 2006 semester in which there were 22341 enrolled. Our university, the National University of Singapore, has different colleges (e.g., Engineering, Science, Law) and within each college there are departments (e.g., Electrical and Computer Engineering, Computer Science). Every department offers graduate and undergraduate degrees, and face to face classes are an integral part of these programs. Many classes also have labs and recitations associated with them. For large classes, there are several recitation sessions offered and students sign up for the recitation session which is most convenient to them. The same goes for the labs. At this time of writing, all lessons are conducted on the main campus of the university at Kent Ridge, spanning an area of 146 hectares.
network configuration: Our insight is that accurate information of human contact patterns is available in several special scenarios such as university campuses. If one knows the class schedules and enrollment of students for each class on a campus, it gives us extremely accurate information about contact patterns between students over large time scales. We obtain this information about student enrollment and class schedules from our university.
We can now describe how we infer the contact patterns among students inside classrooms. The rule is simple - two students are in contact with each other if and only if they are in the same venue at the same time. In other words, we assume that as long as two students are in the same classroom, they are within Bluetooth range of each other. This assumption has been validated inside large classrooms on our campus. We also assume that two students who are in different classrooms are out of range of each other, even if one classroom is just next door to the other. We further assume that contacts take place only during business hours, and ignore that fact that students hang around campus for various activities after hours. We note that the last two assumptions are conservative - the number of contacts we obtained is a lower bound of the actual contacts that take place on campus.
The contact patterns among students that we obtained through the procedure above, give us human contact patterns. From these contact patterns, we can infer contact patterns between mobile devices and explore hypothetical questions about the performance of algorithms.
data collection methodology: Our university has a central Intranet portal for teaching, called Integrated Virtual Learning Environment (IVLE). The Intranet portal hosts a web site for every class that is taught on campus. Professors manage the web site for their respective classes and post lecture notes, quizzes, solutions etc. on their class web site. Information about students enrolled and the schedule for the class is posted on the web site for each class. We wrote a Perl script to harvest this data. For each student we stored information about the classes he was registered for, the start and end time of the class and its venue.
sanitization: We anonymize the identity of the students using MD5.
Traceset of contact patterns among students, collected during the Spring semester of 2006 in National University of Singapore.
- file: mobicom06-trace.txt.gz
- description: The authors obtained the contact patterns among 22341 students, which were inferred from the information on class schedules and class rosters for the Spring semester of 2006 in National University of Singapore.
- measurement purpose: User Mobility Characterization
- methodology: For each class, we obtained the sessions associated with the class, and the students enrolled in the class. A session can be of a certain type, for instance, a lecture session, a recitation session or a laboratory session. A class can have multiple sessions of each type. Sessions of the same type can be grouped into a session group. For instance, a class may hold two lecture sessions (delivering different content) in a week for the same set of students. Both these lecture sessions are said to belong to the same session group. On the other hand, a class with large number of students, may hold two lecture sessions (delivering the same content) in a week for different batches of students. These lecture sessions are considered to be in different session groups. A student signs up for a session group for each type of session in a class he is enrolled in, and is expected to attend all sessions within that session group.
Our Intranet portal does not provide detailed information about which session group a student has signed up for. To fill in these details, we randomly assign a student to a session group. To be more specific, given a student s, for each class c that s has enrolled in, for each session type t of c, s randomly and independently signs up for a session group of type t, and attends all sessions of that session group.
Our random assignment of students to session groups might result in conflicts - that is, a student might have signed up for two sessions which are held at the same time. We adopt a simple approach to deal with such conflicts. If a session group assigned to a student leads to a conflict, the student is randomly assigned to another session group of the same type. If it is impossible to resolve a conflict, the student will not be attending any session group of that type. In our trace, only 3% of all assignments resulted in unresolved conflicts.
After both screen scraping 2 and session assignment, we have a view of which student is attending which session at what time. This data provides us with in-class activity of a student for a week. We further simplify the model in several ways. Firstly, most sessions start on the hour and end on the hour. For the few sessions which are not, we round up the starting time and ending time of the sessions to the nearest hour. This simplification allows us to use one hour as one unit time. Secondly, we "compress" the time by removing any idle time slots without any active sessions. For example, suppose the last session of Monday ends at 9pm, and the first session of Tuesday starts at 8am. If Monday 8pm to 9pm corresponds to the 10th hour, then Tuesday 8am to 9am is the 11th hour in our model. This concept is similar to business days, which counts the number of days excluding weekends and public holidays. We refer to our compressed time unit as a business hour. By compressing the time, we can remove any effects introduced by idle hours during the night and during weekends. For the rest of this paper, when we use the unit hours, we are referring to business hours. Finally, class activities which are held every fortnight are assumed to be held weekly for simplicity.
hole: For a few classes, there are inconsistencies in the way data is stored on the class web sites. For example the schedule information is not available. Large classes (e.g., > 500 students) have different lecture sessions and we do not have information on which lecture sessions these students have signed up for. Also, for a given class, we do not have information on which students have signed up for which recitation and laboratory. We dealt with these issues by defining "session type" and "session group" and applying "random assignment" when the information is not sufficient (see the methodology description above for details).
- limitation: The data we obtained from the Intranet portal gives us the session schedule of students, from which we can infer the contact patterns of students inside the classrooms. Students, however, are likely to come into contact with each other outside of class as well. For instance, at dining halls or libraries. The class schedules and rosters do not provide us with such information.
spring06: The authors obtained the contact patterns among 22341 students, which were inferred from the information on class schedules and class rosters for the Spring semester of 2006 in National University of Singapore.
- configuration: We obtain the class schedules and class rosters from a university-wide Intranet learning portal, and use this information to infer contacts made between students. This trace contains the contact patterns among 22341 students. See the methodology description of the traceset nus/contact/sessions for more details.
- format: The trace data is stored as a text file containing a series of integers.
The first line in the text file consists of two integers. The first integer gives the total number of sessions n, and the second integer gives the total number of students k. The subsequent 2n lines give information about sessions, sorted by the start time of sessions. Each session is described in two lines. The first line gives four integers, giving the start time of a session (in business hours, starting from hour 0), the session id i (numbered from 0 to n-1), the number of students in the session s, and the duration (in hours). The next line lists sstudent ids (numbered from 0 to k-1) of students who registered for session i.
The files in this directory are a CRAWDAD dataset hosted by IEEE DataPort.
About CRAWDAD: the Community Resource for Archiving Wireless Data At Dartmouth is a data resource for the research community interested in wireless networks and mobile computing.
CRAWDAD was founded at Dartmouth College in 2004, led by Tristan Henderson, David Kotz, and Chris McDonald. CRAWDAD datasets are hosted by IEEE DataPort as of November 2022.
Note: Please use the Data in an ethical and responsible way with the aim of doing no harm to any person or entity for the benefit of society at large. Please respect the privacy of any human subjects whose wireless-network activity is captured by the Data and comply with all applicable laws, including without limitation such applicable laws pertaining to the protection of personal information, security of data, and data breaches. Please do not apply, adapt or develop algorithms for the extraction of the true identity of users and other information of a personal nature, which might constitute personally identifiable information or protected health information under any such applicable laws. Do not publish or otherwise disclose to any other person or entity any information that constitutes personally identifiable information or protected health information under any such applicable laws derived from the Data through manual or automated techniques.
Please acknowledge the source of the Data in any publications or presentations reporting use of this Data.
Vikram Srinivasan, Mehul Motani, Wei Tsang Ooi, nus/contact, https://doi.org/10.15783/C70W2C , Date: 20060801
Dataset FilesLOGIN TO ACCESS DATASET FILES
Open Access dataset files are accessible to all logged in users. Don't have a login? Create a free IEEE account. IEEE Membership is not required.
These datasets are part of Community Resource for Archiving Wireless Data (CRAWDAD). CRAWDAD began in 2004 at Dartmouth College as a place to share wireless network data with the research community. Its purpose was to enable access to data from real networks and real mobile users at a time when collecting such data was challenging and expensive. The archive has continued to grow since its inception, and starting in summer 2022 is being housed on IEEE DataPort.