TY - JOUR
T1 - Cooperative dual-actor proximal policy optimization algorithm for multi-robot complex control task
AU - Baltes, Jacky
AU - Akbar, Ilham
AU - Saeedvand, Saeed
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/1
Y1 - 2025/1
N2 - This paper introduces a novel multi-agent Deep Reinforcement Learning (DRL) framework named the Cooperative Dual-Actor Proximal Policy Optimization (CDA-PPO) algorithm, designed to address complex humanoid robot cooperative learning control tasks. Effective cooperation among multiple humanoid robots, particularly in scenarios involving complex walking gait control and external disturbances in dynamic environments, is a critical challenge. This is especially pertinent for tasks requiring precise coordination and control, such as joint object transportation. In various real-life scenarios, humanoid robots might need to cooperate to carry large objects in many scenarios. This capability is crucial for logistics, manufacturing, intelligent transportation, and search-and-rescue missions applications. Humanoid robots have gained significant popularity, and their use in these cooperative tasks is becoming more common. To address this challenge, we propose CDA-PPO, which introduces a learning-based communication platform between agents and employs two distinct policy networks for each agent. This dual-policy approach enhances the robots’ ability to adapt to complex interactions and maintain stability while performing intricate tasks. We demonstrate the efficacy of CDA-PPO in a cooperative object-transportation scenario, where two humanoid robots collaborate to carry a table. The experimental results show that CDA-PPO significantly outperforms traditional methods, such as Independent PPO (IPPO), Multi-Agent PPO (MAPPO), and Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3), in terms of training efficiency, stability, reward acquisition, and humanoid robot cooperative balance control with effective coordination between robots. The findings underscore the potential of CDA-PPO to advance the field of cooperative multi-agent control problems, proposing the way for future research in complex robotics applications.
AB - This paper introduces a novel multi-agent Deep Reinforcement Learning (DRL) framework named the Cooperative Dual-Actor Proximal Policy Optimization (CDA-PPO) algorithm, designed to address complex humanoid robot cooperative learning control tasks. Effective cooperation among multiple humanoid robots, particularly in scenarios involving complex walking gait control and external disturbances in dynamic environments, is a critical challenge. This is especially pertinent for tasks requiring precise coordination and control, such as joint object transportation. In various real-life scenarios, humanoid robots might need to cooperate to carry large objects in many scenarios. This capability is crucial for logistics, manufacturing, intelligent transportation, and search-and-rescue missions applications. Humanoid robots have gained significant popularity, and their use in these cooperative tasks is becoming more common. To address this challenge, we propose CDA-PPO, which introduces a learning-based communication platform between agents and employs two distinct policy networks for each agent. This dual-policy approach enhances the robots’ ability to adapt to complex interactions and maintain stability while performing intricate tasks. We demonstrate the efficacy of CDA-PPO in a cooperative object-transportation scenario, where two humanoid robots collaborate to carry a table. The experimental results show that CDA-PPO significantly outperforms traditional methods, such as Independent PPO (IPPO), Multi-Agent PPO (MAPPO), and Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3), in terms of training efficiency, stability, reward acquisition, and humanoid robot cooperative balance control with effective coordination between robots. The findings underscore the potential of CDA-PPO to advance the field of cooperative multi-agent control problems, proposing the way for future research in complex robotics applications.
KW - Deep reinforcement learning
KW - Humanoid robotics
KW - Isaac gym
KW - Proximal policy optimization
UR - http://www.scopus.com/inward/record.url?scp=85210773405&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85210773405&partnerID=8YFLogxK
U2 - 10.1016/j.aei.2024.102960
DO - 10.1016/j.aei.2024.102960
M3 - Article
AN - SCOPUS:85210773405
SN - 1474-0346
VL - 63
JO - Advanced Engineering Informatics
JF - Advanced Engineering Informatics
M1 - 102960
ER -