TY - GEN
T1 - Exact-win strategy for overcoming AlphaZero
AU - Chen, Yen Chi
AU - Chen, Chih Hung
AU - Lin, Shun Shii
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11/17
Y1 - 2018/11/17
N2 - The Monte-Carlo Tree Search used in the AlphaZero may easily miss a critical move because it is based on sampling search space and focuses on the most promising moves. In addition, the Monte-Carlo Tree Search may sample a move for many times even if this move has been explored with a determined game-theoretical value. In this paper, we propose an Exact-win-MCTS that makes use of sub-tree’s information (WIN, LOSS, DRAW, and UNKNOWN) to prune unneeded moves to increase the opportunities of discovering the critical moves. Our method improves and generalizes some previous MCTS variations as well as the AlphaZero approach. The experiments show that our Exact-win-MCTS substantially promotes the strengths of Tic-Tac-Toe, Connect4, and Go programs especially. Finally, our Exact-win Zero defeats the Leela Zero, which is a replication of AlphaZero and is currently one of the best open-source Go programs, with a significant 61% win rate. Therefore, we are pleased to announce that our Exact-win-MCTS has overcome the AlphaZero approach without using extra training time, playing time, or computer resources. As far as we know, this is the first practical idea with concrete experiments to beat the AlphaZero approach.
AB - The Monte-Carlo Tree Search used in the AlphaZero may easily miss a critical move because it is based on sampling search space and focuses on the most promising moves. In addition, the Monte-Carlo Tree Search may sample a move for many times even if this move has been explored with a determined game-theoretical value. In this paper, we propose an Exact-win-MCTS that makes use of sub-tree’s information (WIN, LOSS, DRAW, and UNKNOWN) to prune unneeded moves to increase the opportunities of discovering the critical moves. Our method improves and generalizes some previous MCTS variations as well as the AlphaZero approach. The experiments show that our Exact-win-MCTS substantially promotes the strengths of Tic-Tac-Toe, Connect4, and Go programs especially. Finally, our Exact-win Zero defeats the Leela Zero, which is a replication of AlphaZero and is currently one of the best open-source Go programs, with a significant 61% win rate. Therefore, we are pleased to announce that our Exact-win-MCTS has overcome the AlphaZero approach without using extra training time, playing time, or computer resources. As far as we know, this is the first practical idea with concrete experiments to beat the AlphaZero approach.
KW - AlphaZero
KW - Exact-win zero
KW - Exact-win-MCTS
KW - Leela zero
KW - MCTS pruning
UR - http://www.scopus.com/inward/record.url?scp=85062270429&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062270429&partnerID=8YFLogxK
U2 - 10.1145/3293475.3293486
DO - 10.1145/3293475.3293486
M3 - Conference contribution
AN - SCOPUS:85062270429
T3 - ACM International Conference Proceeding Series
SP - 26
EP - 31
BT - 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018
PB - Association for Computing Machinery
T2 - 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018
Y2 - 17 November 2018 through 19 November 2018
ER -