TY - GEN
T1 - Reinforcement Learning and Action Space Shaping for a Humanoid Agent in a Highly Dynamic Environment
AU - Song, Jyun Ting
AU - Christmann, Guilherme
AU - Jeong, Jaesik
AU - Baltes, Jacky
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Reinforcement Learning (RL) is a powerful tool and has been increasingly used in continuous control tasks such as locomotion and balancing in robotics. In this paper, we tackle a balancing task in a highly dynamic environment, using a humanoid robot agent and a balancing board. This task requires complex continuous actuation in order for the agent to stay in a balanced state. In this work, we propose an RL algorithm structure based on the state-of-the-art Proximal Policy Optimization (PPO) using GPU-based implementation; the agent achieves successful balancing in under 40 min of real-time. We sought to examine the impact of action space shaping on sample efficiency and designed 6 distinct control modes. Our constrained parallel control modes outperform the naive baseline in both sample efficiency and variance to the starting seed. The best-performing control mode, using parallel configuration, including lower body and shoulder roll joints named (PLS-R), is 33% more sample efficient than all the other defined modes, indicating the impact of action space shaping on the sample efficiency of our approach.Our implementation is open-source and freely available at: https://github.com/NTNU-ERC/Robinion-Balance-Board-PPO.
AB - Reinforcement Learning (RL) is a powerful tool and has been increasingly used in continuous control tasks such as locomotion and balancing in robotics. In this paper, we tackle a balancing task in a highly dynamic environment, using a humanoid robot agent and a balancing board. This task requires complex continuous actuation in order for the agent to stay in a balanced state. In this work, we propose an RL algorithm structure based on the state-of-the-art Proximal Policy Optimization (PPO) using GPU-based implementation; the agent achieves successful balancing in under 40 min of real-time. We sought to examine the impact of action space shaping on sample efficiency and designed 6 distinct control modes. Our constrained parallel control modes outperform the naive baseline in both sample efficiency and variance to the starting seed. The best-performing control mode, using parallel configuration, including lower body and shoulder roll joints named (PLS-R), is 33% more sample efficient than all the other defined modes, indicating the impact of action space shaping on the sample efficiency of our approach.Our implementation is open-source and freely available at: https://github.com/NTNU-ERC/Robinion-Balance-Board-PPO.
KW - Body balancing
KW - Humanoid robot system
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85161419846&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161419846&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-26135-0_4
DO - 10.1007/978-3-031-26135-0_4
M3 - Conference contribution
AN - SCOPUS:85161419846
SN - 9783031261343
T3 - Studies in Computational Intelligence
SP - 29
EP - 42
BT - Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2022-Winter
A2 - Lee, Roger
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th ACIS International Summer Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2022-Winter
Y2 - 7 December 2022 through 9 December 2022
ER -