P-HER: Self-Guided Reinforcement Learning Framework for Efficient Sequential Manipulation in Sparse Reward Environments

Citation Author(s):
Xingyu
Lin
Submitted by:
xingyu li
Last updated:
Tue, 10/08/2024 - 01:22
DOI:
10.21227/44k3-n417
License:
0
0 ratings - Please login to submit your rating.

Abstract 

AbstractMost reinforcement learning algorithms for robotic arm control in sparse reward environments are primarily optimized for end-effector displacement control mode. In this mode, the output parameters must be converted into joint rotation angles through inverse kinematics before being sent to the motors, thereby increasing computational complexity. To reduce computational costs and improve the efficiency of robotic arm control tasks in sparse reward environments, this paper proposes a reinforcement learning framework, the prioritized hindsight experience replay method (P-HER). The algorithm innovatively combines hindsight experience replay (HER) and prioritized experience replay (PER), utilizing two independent experience buffers for failure and success events. This approach achieves targeted sampling and significantly improves sampling efficiency. The effectiveness of the P-HER was validated through OpenAI Panda Gym simulation environment, focusing on the push, slide, and pick and place tasks across two control modes. In the joint control mode, the output parameter is only the rotation angle of the robot joint, thereby eliminating the need for inverse kinematics and greatly reducing computational resource usage. Experimental results show that P-HER outperforms the baseline algorithms in all three tasks, especially in the joint control mode, which is crucial for real-world robotic arm control.

Note to Practitioners—This study is motivated by the need to develop efficient reinforcement learning algorithms for robotic arm control tasks in sparse reward environments. In such tasks, experience replay-based algorithms are widely applied due to their effectiveness and flexibility. Most existing algorithms focus on robotic arm control tasks using the end-effector displacement control mode, which requires inverse kinematics computations and consumes substantial computational resources. The proposed P-HER algorithm significantly outperforms baseline algorithms in robotic arm control tasks under the joint control mode, offering a promising solution for real-world robotic arm control scenarios.

Instructions: 

This is the data file corresponding to the P-HER algorithm