Hindsight Proximal Policy Optimization based Deep Reinforcement Learning Manipulator Control

Citation Author(s):
ShengChe
Su
Submitted by:
SU Sheng jez
Last updated:
Fri, 12/27/2024 - 11:55
DOI:
10.21227/4g2z-p277
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The demand for intelligent automation in factories has been steadily increasing. While traditional robotic arms perform simple automated tasks, deep reinforcement learning enables them to execute more complex operations. However, deep reinforcement learning in the field of robotics often encounters challenging learning tasks, especially in three-dimensional and continuous environments where obtaining rewards becomes sparse. To address this issue, this article proposes the Hindsight Proximal Policy Optimization (HPPO) method for intelligent robotic control. HPPO combines the ideas of Proximal Policy Optimization (PPO) and Hindsight Experience Replay (HER) to enhance the adaptability and sample efficiency of PPO in sparse reward environments. In contrast to conventional reinforcement learning architectures, we introduce the Multi-goal concept, which provides the agent with clear objectives during interactions with the environment. Additionally, we incorporate the generation of synthetic data from the HER algorithm, enabling the agent to learn from failures and achieve goals more efficiently. A series of experiments were conducted in a simulated robotic arm control environment, comparing HPPO with other deep reinforcement learning algorithms. The results demonstrate significant improvements in HPPO, as it exhibits superior adaptability and increased sample efficiency in sparse reward environments. HPPO's practicality in robotic arm control is verified, and its potential applicability to various robotic control scenarios is established based on this approach. 

Instructions: 

A Hindsight Proximal Policy Optimization (HPPO) implementation for MuJoCo Fetch robotics environments.

 

Dataset Files

    Files have not been uploaded for this dataset