Outdoor temperature data collected by taxis in Rome, Italy.

This dataset is to be used in conjunction with the roma/taxi dataset and provides the outdoor temperature of the areas in Rome where the taxis were located (289 taxicabs over 4 days).

date/time of measurement start: 2012-08-15

date/time of measurement end: 2014-02-04

collection environment: We simulate taxicabs as if they were equipped with temperature sensors attached to their vehicles. The city of Rome is divided into 9 areas and temperate readings were gathered from a weather service, allowing us to simulate each taxicab sending its sensed temperature to a central server every 6 hours for each area.


data name: roma/taxi

note: The original dataset is the CRAWDAD roma/taxi dataset that comprises the position of each taxicab using GPS. This dataset adds the outdoor temperature of the areas that taxicabs visit during their services.




Simulated outdoor temperature data collected by taxis in Rome, Italy.

  • files: crowd_temperature.csv
  • methodology: We generate a temperature value for every active taxicab by applying Gaussian distribution. To fill out the parameters of Gaussian function, we need to assign the mean mu; and standard deviation sigma; for every run. Therefore, we assign a ground truth temperature mu; for every period in every grid on every day. We use data from The Weather Network ( to assign the right ground truth to the right period and grid. For every taxicab, we assign a fixed error range sigma; that remains the same in all of its contributions. To do so, we randomly classify participant taxicabs into three classes. First class, called "honest", consists of taxicabs that usually sense accurate temperature within a 10% error range from the ground truth. The population of honest class is 145 taxicabs (50% of all participant taxicabs). Second class, called "dishonest", consists of taxicabs that usually sense inaccurate temperature within a 30% error range from the ground truth. The population of the dishonest class is 72 taxicabs (25%). Third class, called "misleading", consists of the rest of the participant taxicabs that is 72 (25%) that usually sense either accurate or inaccurate temperature. The data generator function makes a random decision of generating accurate or inaccurate temperature for each taxicab among the misleading class. The latter class plays a major role in the results of applying the data on a system, such as participants reputation system, since the accuracy of their contributions is not even. As a result, each taxicab has a sensed temperature contribution based on its fixed error range and the ground truth of the day, period and grid of its location.

