Generalized Sampling and Variance in Counterfactual Regret Minimization

Richard Gibson; Marc Lanctot; Neil Burch; Duane Szafron; Michael Bowling

doi:10.1609/aaai.v26i1.8241

Authors

Richard Gibson University of Alberta
Marc Lanctot University of Alberta
Neil Burch University of Alberta
Duane Szafron University of Alberta
Michael Bowling University of Alberta

DOI:

https://doi.org/10.1609/aaai.v26i1.8241

Keywords:

Game Theory, Sequential Decision Making

Abstract

In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR's sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR estimates. Finally, we use this estimator in a new sampling algorithm to compute approximate equilibria in Goofspiel, Bluff, and Texas hold'em poker. Under each of our selected sampling schemes, our new algorithm converges faster than MCCFR.

Generalized Sampling and Variance in Counterfactual Regret Minimization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription