Exact-win strategy for overcoming AlphaZero

Yen Chi Chen, Chih Hung Chen, Shun-Shii Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Monte-Carlo Tree Search used in the AlphaZero may easily miss a critical move because it is based on sampling search space and focuses on the most promising moves. In addition, the Monte-Carlo Tree Search may sample a move for many times even if this move has been explored with a determined game-theoretical value. In this paper, we propose an Exact-win-MCTS that makes use of sub-tree’s information (WIN, LOSS, DRAW, and UNKNOWN) to prune unneeded moves to increase the opportunities of discovering the critical moves. Our method improves and generalizes some previous MCTS variations as well as the AlphaZero approach. The experiments show that our Exact-win-MCTS substantially promotes the strengths of Tic-Tac-Toe, Connect4, and Go programs especially. Finally, our Exact-win Zero defeats the Leela Zero, which is a replication of AlphaZero and is currently one of the best open-source Go programs, with a significant 61% win rate. Therefore, we are pleased to announce that our Exact-win-MCTS has overcome the AlphaZero approach without using extra training time, playing time, or computer resources. As far as we know, this is the first practical idea with concrete experiments to beat the AlphaZero approach.

Original languageEnglish
Title of host publication2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018
PublisherAssociation for Computing Machinery
Pages26-31
Number of pages6
ISBN (Electronic)9781450365956
DOIs
Publication statusPublished - 2018 Nov 17
Event2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018 - Phuket, Thailand
Duration: 2018 Nov 172018 Nov 19

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018
CountryThailand
CityPhuket
Period18/11/1718/11/19

Fingerprint

Experiments
Concretes
Sampling

Keywords

  • AlphaZero
  • Exact-win zero
  • Exact-win-MCTS
  • Leela zero
  • MCTS pruning

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Chen, Y. C., Chen, C. H., & Lin, S-S. (2018). Exact-win strategy for overcoming AlphaZero. In 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018 (pp. 26-31). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/3293475.3293486

Exact-win strategy for overcoming AlphaZero. / Chen, Yen Chi; Chen, Chih Hung; Lin, Shun-Shii.

2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018. Association for Computing Machinery, 2018. p. 26-31 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, YC, Chen, CH & Lin, S-S 2018, Exact-win strategy for overcoming AlphaZero. in 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018. ACM International Conference Proceeding Series, Association for Computing Machinery, pp. 26-31, 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018, Phuket, Thailand, 18/11/17. https://doi.org/10.1145/3293475.3293486
Chen YC, Chen CH, Lin S-S. Exact-win strategy for overcoming AlphaZero. In 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018. Association for Computing Machinery. 2018. p. 26-31. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3293475.3293486
Chen, Yen Chi ; Chen, Chih Hung ; Lin, Shun-Shii. / Exact-win strategy for overcoming AlphaZero. 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018. Association for Computing Machinery, 2018. pp. 26-31 (ACM International Conference Proceeding Series).
@inproceedings{a4226e1594954d62854830713177632d,
title = "Exact-win strategy for overcoming AlphaZero",
abstract = "The Monte-Carlo Tree Search used in the AlphaZero may easily miss a critical move because it is based on sampling search space and focuses on the most promising moves. In addition, the Monte-Carlo Tree Search may sample a move for many times even if this move has been explored with a determined game-theoretical value. In this paper, we propose an Exact-win-MCTS that makes use of sub-tree’s information (WIN, LOSS, DRAW, and UNKNOWN) to prune unneeded moves to increase the opportunities of discovering the critical moves. Our method improves and generalizes some previous MCTS variations as well as the AlphaZero approach. The experiments show that our Exact-win-MCTS substantially promotes the strengths of Tic-Tac-Toe, Connect4, and Go programs especially. Finally, our Exact-win Zero defeats the Leela Zero, which is a replication of AlphaZero and is currently one of the best open-source Go programs, with a significant 61{\%} win rate. Therefore, we are pleased to announce that our Exact-win-MCTS has overcome the AlphaZero approach without using extra training time, playing time, or computer resources. As far as we know, this is the first practical idea with concrete experiments to beat the AlphaZero approach.",
keywords = "AlphaZero, Exact-win zero, Exact-win-MCTS, Leela zero, MCTS pruning",
author = "Chen, {Yen Chi} and Chen, {Chih Hung} and Shun-Shii Lin",
year = "2018",
month = "11",
day = "17",
doi = "10.1145/3293475.3293486",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
pages = "26--31",
booktitle = "2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018",

}

TY - GEN

T1 - Exact-win strategy for overcoming AlphaZero

AU - Chen, Yen Chi

AU - Chen, Chih Hung

AU - Lin, Shun-Shii

PY - 2018/11/17

Y1 - 2018/11/17

N2 - The Monte-Carlo Tree Search used in the AlphaZero may easily miss a critical move because it is based on sampling search space and focuses on the most promising moves. In addition, the Monte-Carlo Tree Search may sample a move for many times even if this move has been explored with a determined game-theoretical value. In this paper, we propose an Exact-win-MCTS that makes use of sub-tree’s information (WIN, LOSS, DRAW, and UNKNOWN) to prune unneeded moves to increase the opportunities of discovering the critical moves. Our method improves and generalizes some previous MCTS variations as well as the AlphaZero approach. The experiments show that our Exact-win-MCTS substantially promotes the strengths of Tic-Tac-Toe, Connect4, and Go programs especially. Finally, our Exact-win Zero defeats the Leela Zero, which is a replication of AlphaZero and is currently one of the best open-source Go programs, with a significant 61% win rate. Therefore, we are pleased to announce that our Exact-win-MCTS has overcome the AlphaZero approach without using extra training time, playing time, or computer resources. As far as we know, this is the first practical idea with concrete experiments to beat the AlphaZero approach.

AB - The Monte-Carlo Tree Search used in the AlphaZero may easily miss a critical move because it is based on sampling search space and focuses on the most promising moves. In addition, the Monte-Carlo Tree Search may sample a move for many times even if this move has been explored with a determined game-theoretical value. In this paper, we propose an Exact-win-MCTS that makes use of sub-tree’s information (WIN, LOSS, DRAW, and UNKNOWN) to prune unneeded moves to increase the opportunities of discovering the critical moves. Our method improves and generalizes some previous MCTS variations as well as the AlphaZero approach. The experiments show that our Exact-win-MCTS substantially promotes the strengths of Tic-Tac-Toe, Connect4, and Go programs especially. Finally, our Exact-win Zero defeats the Leela Zero, which is a replication of AlphaZero and is currently one of the best open-source Go programs, with a significant 61% win rate. Therefore, we are pleased to announce that our Exact-win-MCTS has overcome the AlphaZero approach without using extra training time, playing time, or computer resources. As far as we know, this is the first practical idea with concrete experiments to beat the AlphaZero approach.

KW - AlphaZero

KW - Exact-win zero

KW - Exact-win-MCTS

KW - Leela zero

KW - MCTS pruning

UR - http://www.scopus.com/inward/record.url?scp=85062270429&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062270429&partnerID=8YFLogxK

U2 - 10.1145/3293475.3293486

DO - 10.1145/3293475.3293486

M3 - Conference contribution

AN - SCOPUS:85062270429

T3 - ACM International Conference Proceeding Series

SP - 26

EP - 31

BT - 2018 International Conference on Computational Intelligence and Intelligent Systems, CIIS 2018

PB - Association for Computing Machinery

ER -