Two-Phase-Win Strategy for Improving the AlphaZero's Strength

Chih Hung Chen, Yen Chi Chen, Shun Shii Lin

研究成果: 書貢獻/報告類型會議論文篇章

摘要

AlphaZero used a combination of Monte-Carlo Tree Search as well as deep neural networks that learned without human knowledge. It demonstrated that reinforcement learning by self-play could surpass the human champions. The great success of AlphaZero seems that every AI tasks can be trained and learned without any human knowledge as well as any human heuristics. But this paper presents another viewpoint: the AlphaZero approach is good at the perspective of overall situations, and miniMax search (with alpha-beta pruning) is adept in discovering partial solutions. Therefore, we introduce the Two-Phase-Win strategy to combine AlphaZero and miniMax search with alpha-beta pruning for improving AlphaZero's strength. It has improved the strength of the AlphaZero approach applied to Connect4. The results of experiments show that the Two-Phase-Win strategy has 58% win rate against the AlphaZero approach and doesn't lose any game in a 100-game match.

原文英語
主出版物標題Proceedings of 2019 2nd World Symposium on Communication Engineering, WSCE 2019
發行者Institute of Electrical and Electronics Engineers Inc.
頁面117-121
頁數5
ISBN(電子)9781728153292
DOIs
出版狀態已發佈 - 2019 十二月
事件2nd World Symposium on Communication Engineering, WSCE 2019 - Nagoya, 日本
持續時間: 2019 十二月 202019 十二月 23

出版系列

名字Proceedings of 2019 2nd World Symposium on Communication Engineering, WSCE 2019

會議

會議2nd World Symposium on Communication Engineering, WSCE 2019
國家日本
城市Nagoya
期間2019/12/202019/12/23

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

指紋 深入研究「Two-Phase-Win Strategy for Improving the AlphaZero's Strength」主題。共同形成了獨特的指紋。

引用此