Abstract—Most reinforcement learning algorithms for robotic arm control in sparse reward environments are primarily optimized for end-effector displacement control mode.