AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

Authors

  • Enmin Zhao Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences
  • Renye Yan School of artificial intelligence, University of Chinese Academy of Science Institute of Automation,Chinese Academy of Sciences
  • Jinqiu Li Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences
  • Kai Li Institute of Automation, Chinese Academy of Sciences School of artificial intelligence, University of Chinese Academy of Sciences
  • Junliang Xing Institute of Automation, Chinese Academy of Sciences Tsinghua University School of Artificial Intelligence, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v36i4.20394

Keywords:

Domain(s) Of Application (APP), Machine Learning (ML), Multiagent Systems (MAS), Humans And AI (HAI)

Abstract

Heads-up no-limit Texas hold’em (HUNL) is the quintessential game with imperfect information. Representative priorworks like DeepStack and Libratus heavily rely on counter-factual regret minimization (CFR) and its variants to tackleHUNL. However, the prohibitive computation cost of CFRiteration makes it difficult for subsequent researchers to learnthe CFR model in HUNL and apply it in other practical applications. In this work, we present AlphaHoldem, a high-performance and lightweight HUNL AI obtained with an end-to-end self-play reinforcement learning framework. The proposed framework adopts a pseudo-siamese architecture to directly learn from the input state information to the output actions by competing the learned model with its different historical versions. The main technical contributions include anovel state representation of card and betting information, amultitask self-play training loss function, and a new modelevaluation and selection metric to generate the final model.In a study involving 100,000 hands of poker, AlphaHoldemdefeats Slumbot and DeepStack using only one PC with threedays training. At the same time, AlphaHoldem only takes 2.9milliseconds for each decision-making using only a singleGPU, more than 1,000 times faster than DeepStack. We release the history data among among AlphaHoldem, Slumbot,and top human professionals in the author’s GitHub repository to facilitate further studies in this direction.

Downloads

Published

2022-06-28

How to Cite

Zhao, E., Yan, R., Li, J., Li, K., & Xing, J. (2022). AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 4689-4697. https://doi.org/10.1609/aaai.v36i4.20394

Issue

Section

AAAI Technical Track on Domain(s) Of Application