TRANSIT_dataset

Name: TRANSIT_dataset
Creator: Huize Sun
Keywords: Transportation

Citation Author(s):: Huize Sun (Peking University)
Submitted by:: Huize Sun
Last updated:: Mon, 04/28/2025 - 02:32
DOI:: 10.21227/5rp9-9696
Data Format:: *.csv

*.npy
Links:: TRANSIT code

TRANSIT dataset

14 views

Categories:

Transportation

Keywords:

Intelligent Transportation Systems

Multimodal

computer simulation

anomalous dataset

ACCESS DATASET CITE

Abstract

TRANSIT dataset is the default dataset of the simulation software Traffic Anomaly Simulation Tool (TRANSIT). You can get the code of TRANSIT in the page:

bigzeze/TRANSIT: TRaffic ANomaly SImulation Tool

Instructions:

Introduction

TRANSIT dataset is the default dataset of the simulation software Traffic Anomaly Simulation Tool (TRANSIT). You can get the code of TRANSIT in the page:

bigzeze/TRANSIT: TRaffic ANomaly SImulation Tool

File Structure

Each folder under the root directory represents a type of simulated road network scenario. The second level directories denote anomaly injection types, and the second-level directories store datasets. The data files under each anomaly scenario are similar, including:

detectors.csv – Raw data file for fixed detector data
trajectory.csv – Raw data file for floating car data
detectors.npy – Preprocessed file for fixed detector data
nodes.npy – Names of fixed detectors, with the same order as detectors.npy
events.txt – Event sequence data

Meta Data

For CSV files, refer to the headers.

detectors.npy

It is an array with the shape of [number_of_nodes, number_of_metrics, length_of_time]. The raw detector data is aggregated at the road segment level (nodes). The length of the first dimension in the dataset corresponds to the number of monitored road segments in the network. The second dimension represents the metrics, sequentially storing flow rate (veh/min), occupancy (%), and speed (m/s) for each road segment. The third dimension represents the time, where the interval between two consecutive data points depends on the detection interval length defined in the simulation.

nodes.npy

It is an one-dimension array, storing names of the nodes. It maintains the same order as the detectors.npy.

events.txt

It saves records for simulation logs and congestion events. For each record, it first records the simulation timestep, followed by a textual description.

Application Prospects

This dataset is currently being used in an ongoing research project—deep learning-based causal discovery for traffic anomalies. It also holds significant potential for applications in training models for anomaly detection, traffic prediction, multimodal time series alignment in transportation, and related tasks.