Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving

Citation Author(s):
Guizhe
Jin
School of Automotive Studies, Tongji University
Zhuoren
Li
School of Automotive Studies, Tongji University
Leng
Bo
School of Automotive Studies, Tongji University
Wei
Han
School of Automotive Studies, Tongji University
Lu
Xiong
School of Automotive Studies, Tongji University
Chen
Sun
Department of Data and Systems Engineering, University of Hong Kong
Submitted by:
Guizhe Jin
Last updated:
Tue, 01/21/2025 - 12:21
DOI:
10.21227/gr5v-xg04
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

Reinforcement Learning (RL) has shown excellent performance in solving decision-making and control problems of autonomous driving, which is increasingly applied in diverse driving scenarios. However, driving is a multi-attribute problem, leading to challenges in achieving multi-objective compatibility for current RL methods, especially in both policy execution and policy iteration. We propose a Multi-objective Ensemble-Critic reinforcement learning method with Hybrid Parametrized Action for multi-objective compatible autonomous driving. The experimental results in both the simulated traffic environment and the HighD dataset demonstrate that our method can achieve multi-objective compatible autonomous driving in terms of driving efficiency, action consistency, and safety. It enhances the general performance of the driving while significantly increasing training efficiency. The detailed training and testing data are presented in this dataset.

Instructions: 

The dataset includes:

1) The total reward and collision rate of the episode at training for HPA-MoEC (our method) and the comparison baseline (DQN, PPO-H, SAC-C, SAC-H).

2) Average reward, collision rate, average speed, number of lane changes, variance of steering angle and acceleration for all methods at the time of testing. There are two testing environments: i) with rule-based surrounding vehicles (SVs); ii) in the HighD dataset.

3) Ablation studies of paper technology components removed sequentially, with the same metrics and test environments as in (2).

4) The data recording of average epistemic uncertainty and action epistemic uncertainty during training.