Yesen Chen

EMDSAC-ft: Bridging the Gap in Offline-to-Online Reinforcement Learning through Value Distribution Learning

Offline-to-online is a key strategy for advancing reinforcement learning towards practical applications. This approach not only reduces the risks and costs associated with online exploration, but also accelerates the agent’s adaptation to real-world environments. It consists of two phases: offline-training and fine-tuning. However, offline-training and fine-tuning have different problems. In offline-training, the main difficulty is how to learn an excellent policy in a limited and incompletely distributed dataset.

Categories:: Artificial Intelligence

79 Views

Light-weight ensemble Q-network joint implicit constraints for offline reinforcement learning

Offline reinforcement learning aims to learn policies from a limited dataset without interacting with the environment. However, the restricted nature of the dataset limits the agent's understanding of the environment, leading to out-of-distribution (OOD) behavior and extrapolation errors. Conventional research can be categorized into four main approaches: Q-value penalties, policy constraints, uncertainty estimation, and importance sampling. Most existing methods impose overly strict penalties.

Categories:: Other

30 Views

Yesen Chen

Datasets & Competitions

EMDSAC-ft: Bridging the Gap in Offline-to-Online Reinforcement Learning through Value Distribution Learning

Light-weight ensemble Q-network joint implicit constraints for offline reinforcement learning