Two-Phase-Win Strategy for Improving the AlphaZero's Strength

研究成果: 書貢獻/報告類型會議論文篇章

摘要

AlphaZero used a combination of Monte-Carlo Tree Search as well as deep neural networks that learned without human knowledge. It demonstrated that reinforcement learning by self-play could surpass the human champions. The great success of AlphaZero seems that every AI tasks can be trained and learned without any human knowledge as well as any human heuristics. But this paper presents another viewpoint: the AlphaZero approach is good at the perspective of overall situations, and miniMax search (with alpha-beta pruning) is adept in discovering partial solutions. Therefore, we introduce the Two-Phase-Win strategy to combine AlphaZero and miniMax search with alpha-beta pruning for improving AlphaZero's strength. It has improved the strength of the AlphaZero approach applied to Connect4. The results of experiments show that the Two-Phase-Win strategy has 58% win rate against the AlphaZero approach and doesn't lose any game in a 100-game match.

原文英語
主出版物標題Proceedings of 2019 2nd World Symposium on Communication Engineering, WSCE 2019
發行者Institute of Electrical and Electronics Engineers Inc.
頁面117-121
頁數5
ISBN(電子)9781728153292
DOIs
出版狀態已發佈 - 2019 12月
事件2nd World Symposium on Communication Engineering, WSCE 2019 - Nagoya, 日本
持續時間: 2019 12月 202019 12月 23

出版系列

名字Proceedings of 2019 2nd World Symposium on Communication Engineering, WSCE 2019

會議

會議2nd World Symposium on Communication Engineering, WSCE 2019
國家/地區日本
城市Nagoya
期間2019/12/202019/12/23

ASJC Scopus subject areas

  • 電腦網路與通信
  • 硬體和架構
  • 資訊系統與管理

指紋

深入研究「Two-Phase-Win Strategy for Improving the AlphaZero's Strength」主題。共同形成了獨特的指紋。

引用此