Generalized Sampling and Variance in Counterfactual Regret Minimization

Authors

  • Richard Gibson University of Alberta
  • Marc Lanctot University of Alberta
  • Neil Burch University of Alberta
  • Duane Szafron University of Alberta
  • Michael Bowling University of Alberta

DOI:

https://doi.org/10.1609/aaai.v26i1.8241

Keywords:

Game Theory, Sequential Decision Making

Abstract

In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR's sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR estimates. Finally, we use this estimator in a new sampling algorithm to compute approximate equilibria in Goofspiel, Bluff, and Texas hold'em poker. Under each of our selected sampling schemes, our new algorithm converges faster than MCCFR.

Downloads

Published

2021-09-20

How to Cite

Gibson, R., Lanctot, M., Burch, N., Szafron, D., & Bowling, M. (2021). Generalized Sampling and Variance in Counterfactual Regret Minimization. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 1355-1361. https://doi.org/10.1609/aaai.v26i1.8241

Issue

Section

AAAI Technical Track: Multiagent Systems