Cooperative dual-actor proximal policy optimization algorithm for multi-robot complex control task

Jacky Baltes, Ilham Akbar, Saeed Saeedvand*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces a novel multi-agent Deep Reinforcement Learning (DRL) framework named the Cooperative Dual-Actor Proximal Policy Optimization (CDA-PPO) algorithm, designed to address complex humanoid robot cooperative learning control tasks. Effective cooperation among multiple humanoid robots, particularly in scenarios involving complex walking gait control and external disturbances in dynamic environments, is a critical challenge. This is especially pertinent for tasks requiring precise coordination and control, such as joint object transportation. In various real-life scenarios, humanoid robots might need to cooperate to carry large objects in many scenarios. This capability is crucial for logistics, manufacturing, intelligent transportation, and search-and-rescue missions applications. Humanoid robots have gained significant popularity, and their use in these cooperative tasks is becoming more common. To address this challenge, we propose CDA-PPO, which introduces a learning-based communication platform between agents and employs two distinct policy networks for each agent. This dual-policy approach enhances the robots’ ability to adapt to complex interactions and maintain stability while performing intricate tasks. We demonstrate the efficacy of CDA-PPO in a cooperative object-transportation scenario, where two humanoid robots collaborate to carry a table. The experimental results show that CDA-PPO significantly outperforms traditional methods, such as Independent PPO (IPPO), Multi-Agent PPO (MAPPO), and Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3), in terms of training efficiency, stability, reward acquisition, and humanoid robot cooperative balance control with effective coordination between robots. The findings underscore the potential of CDA-PPO to advance the field of cooperative multi-agent control problems, proposing the way for future research in complex robotics applications.

Original languageEnglish
Article number102960
JournalAdvanced Engineering Informatics
Volume63
DOIs
Publication statusPublished - 2025 Jan

Keywords

  • Deep reinforcement learning
  • Humanoid robotics
  • Isaac gym
  • Proximal policy optimization

ASJC Scopus subject areas

  • Information Systems
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Cooperative dual-actor proximal policy optimization algorithm for multi-robot complex control task'. Together they form a unique fingerprint.

Cite this