AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Enmin Zhao; Renye Yan; Jinqiu Li; Kai Li; Junliang Xing

doi:10.1609/aaai.v36i4.20394

Authors

Enmin Zhao Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences
Renye Yan School of artificial intelligence, University of Chinese Academy of Science Institute of Automation,Chinese Academy of Sciences
Jinqiu Li Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences
Kai Li Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences
Junliang Xing Institute of Automation, Chinese Academy of Sciences Tsinghua University School of Artificial Intelligence, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v36i4.20394

Keywords:

Domain(s) Of Application (APP), Machine Learning (ML), Multiagent Systems (MAS), Humans And AI (HAI)

Abstract

Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative priorworks like DeepStack and Libratus heavily rely on counter-factual regret minimization (CFR) and its variants to tackleHUNL. However, the prohibitive computation cost of CFRiteration makes it difficult for subsequent researchers to learnthe CFR model in HUNL and apply it in other practical applications. In this work, we present AlphaHoldem, a high-performance and lightweight HUNL AI obtained with an end-to-end self-play reinforcement learning framework. The proposed framework adopts a pseudo-siamese architecture to directly learn from the input state information to the output actions by competing the learned model with its different historical versions. The main technical contributions include anovel state representation of card and betting information, amultitask self-play training loss function, and a new modelevaluation and selection metric to generate the final model.In a study involving 100,000 hands of poker, AlphaHoldemdefeats Slumbot and DeepStack using only one PC with threedays training. At the same time, AlphaHoldem only takes 2.9milliseconds for each decision-making using only a singleGPU, more than 1,000 times faster than DeepStack. We release the history data among among AlphaHoldem, Slumbot,and top human professionals in the author’s GitHub repository to facilitate further studies in this direction.

AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription