Faster Game Solving via Hyperparameter Schedules

Naifeng Zhang; Stephen Marcus McAleer; Tuomas Sandholm

doi:10.1609/aaai.v40i20.38784

Authors

Naifeng Zhang Carnegie Mellon University
Stephen Marcus McAleer Anthropic
Tuomas Sandholm Carnegie Mellon University Strategy Robot, Inc. Strategic Machine, Inc. Optimized Markets, Inc.

DOI:

https://doi.org/10.1609/aaai.v40i20.38784

Abstract

Counterfactual regret minimization (CFR) algorithms are a foundational class of methods for solving imperfect-information games, with the time average of their iterates converging to a Nash equilibrium in two-player zero-sum games. Prior state-of-the-art variants, Discounted CFR (DCFR) and Predictive CFR+ (PCFR+), achieved the fastest known practical performance by improving convergence rates over vanilla CFR through discounting early iterations with a fixed discounting scheme. More recently, Dynamic DCFR (DDCFR) introduced agent-learned dynamic discounting schemes to further accelerate convergence, at the cost of increased complexity. To address this, we propose Hyperparameter Schedules (HSs), a remarkably simple, training-free framework that dynamically adjusts CFR discounting over time. HSs aggressively downweight early updates and gradually transition to trusting late-stage strategies, leading to substantially faster convergence with only a few lines of code modifications. We show that HSs derived from just three small extensive-form games generalize effectively to 17 diverse games (including large-scale realistic poker) in both extensive-form and normal-form settings, without any game-specific tuning. Our method establishes a new state of the art for solving two-player zero-sum games.

Faster Game Solving via Hyperparameter Schedules

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information