Regret Transfer and Parameter Optimization

Noam Brown; Tuomas Sandholm

doi:10.1609/aaai.v28i1.8832

Authors

Noam Brown Carnegie Mellon University
Tuomas Sandholm Carnegie Mellon University

DOI:

https://doi.org/10.1609/aaai.v28i1.8832

Keywords:

Incomplete-Information Games, Poker, Game Solving, No-Regret Learning, Counterfactual Regret Minimization, Regret Matching, Regret Minimization

Abstract

Regret matching is a widely-used algorithm for learning how to act. We begin by proving that regrets on actions in one setting (game) can be transferred to warm start the regrets for solving a different setting with same structure but different payoffs that can be written as a function of parameters. We prove how this can be done by carefully discounting the prior regrets. This provides, to our knowledge, the first principled warm-starting method for no-regret learning. It also extends to warm-starting the widely-adopted counterfactual regret minimization (CFR) algorithm for large incomplete-information games; we show this experimentally as well. We then study optimizing a parameter vector for a player in a two-player zero-sum game (e.g., optimizing bet sizes to use in poker). We propose a custom gradient descent algorithm that provably finds a locally optimal parameter vector while leveraging our warm-start theory to significantly save regret-matching iterations at each step. It optimizes the parameter vector while simultaneously finding an equilibrium. We present experiments in no-limit Leduc Hold'em and no-limit Texas Hold'em to optimize bet sizing. This amounts to the first action abstraction algorithm (algorithm for selecting a small number of discrete actions to use from a continuum of actions---a key preprocessing step for solving large games using current equilibrium-finding algorithms) with convergence guarantees for extensive-form games.

Regret Transfer and Parameter Optimization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information