Two-Phase-Win Strategy for Improving the AlphaZero's Strength

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

AlphaZero used a combination of Monte-Carlo Tree Search as well as deep neural networks that learned without human knowledge. It demonstrated that reinforcement learning by self-play could surpass the human champions. The great success of AlphaZero seems that every AI tasks can be trained and learned without any human knowledge as well as any human heuristics. But this paper presents another viewpoint: the AlphaZero approach is good at the perspective of overall situations, and miniMax search (with alpha-beta pruning) is adept in discovering partial solutions. Therefore, we introduce the Two-Phase-Win strategy to combine AlphaZero and miniMax search with alpha-beta pruning for improving AlphaZero's strength. It has improved the strength of the AlphaZero approach applied to Connect4. The results of experiments show that the Two-Phase-Win strategy has 58% win rate against the AlphaZero approach and doesn't lose any game in a 100-game match.

Original languageEnglish
Title of host publicationProceedings of 2019 2nd World Symposium on Communication Engineering, WSCE 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages117-121
Number of pages5
ISBN (Electronic)9781728153292
DOIs
Publication statusPublished - 2019 Dec
Event2nd World Symposium on Communication Engineering, WSCE 2019 - Nagoya, Japan
Duration: 2019 Dec 202019 Dec 23

Publication series

NameProceedings of 2019 2nd World Symposium on Communication Engineering, WSCE 2019

Conference

Conference2nd World Symposium on Communication Engineering, WSCE 2019
Country/TerritoryJapan
CityNagoya
Period2019/12/202019/12/23

Keywords

  • alpha-beta pruning
  • alphazero
  • minimax search
  • monte-carlo tree search
  • two-phase-win

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Two-Phase-Win Strategy for Improving the AlphaZero's Strength'. Together they form a unique fingerprint.

Cite this