Skip to main content

Datasets

Standard Dataset

TRANSIT_dataset

Citation Author(s):
Huize Sun (Peking University)
Submitted by:
Huize Sun
Last updated:
DOI:
10.21227/5rp9-9696
Data Format:
Links:
No Ratings Yet

Abstract

TRANSIT dataset is the default dataset of the simulation software Traffic Anomaly Simulation Tool (TRANSIT). You can get the code of TRANSIT in the page:

bigzeze/TRANSIT: TRaffic ANomaly SImulation Tool

Instructions:

Introduction

TRANSIT dataset is the default dataset of the simulation software Traffic Anomaly Simulation Tool (TRANSIT). You can get the code of TRANSIT in the page:

bigzeze/TRANSIT: TRaffic ANomaly SImulation Tool

File Structure

Each folder under the root directory represents a type of simulated road network scenario. The second level directories denote anomaly injection types, and the second-level directories store datasets. The data files under each anomaly scenario are similar, including:

  • detectors.csv – Raw data file for fixed detector data
  • trajectory.csv – Raw data file for floating car data
  • detectors.npy – Preprocessed file for fixed detector data
  • nodes.npy – Names of fixed detectors, with the same order as detectors.npy
  • events.txt – Event sequence data

Meta Data

For CSV files, refer to the headers.

detectors.npy

It is an array with the shape of [number_of_nodes, number_of_metrics, length_of_time]. The raw detector data is aggregated at the road segment level (nodes). The length of the first dimension in the dataset corresponds to the number of monitored road segments in the network.  The second dimension represents the metrics, sequentially storing flow rate (veh/min), occupancy (%), and speed (m/s) for each road segment. The third dimension represents the time, where the interval between two consecutive data points depends on the detection interval length defined in the simulation.

nodes.npy

It is an one-dimension array, storing names of the nodes. It maintains the same order as the detectors.npy.

events.txt

It saves records for simulation logs and congestion events. For each record, it first records the simulation timestep, followed by a textual description.

Application Prospects

This dataset is currently being used in an ongoing research project—deep learning-based causal discovery for traffic anomalies. It also holds significant potential for applications in training models for anomaly detection, traffic prediction, multimodal time series alignment in transportation, and related tasks.