Convergence of Fast Policy Iteration in Markov Games and Robust MDPs

Keith Badger; Jefferson Huang; Marek Petrik

doi:10.1609/aaai.v40i24.39045

Convergence of Fast Policy Iteration in Markov Games and Robust MDPs

Authors

Keith Badger University of New Hampshire, Durham
Jefferson Huang Naval Postgraduate School
Marek Petrik University of New Hampshire, Durham

DOI:

https://doi.org/10.1609/aaai.v40i24.39045

Abstract

Markov games and robust MDPs are closely related models that involve computing a pair of saddle point policies. As part of the long-standing effort to develop efficient algorithms for these models, the Filar-Tolwinski (FT) algorithm has shown considerable promise. As our first contribution, we demonstrate that FT may fail to converge to a saddle point and may loop indefinitely, even in small games. This observation contradicts the proof of FT's optimality in the original paper. As our second contribution, we then propose Residual Conditioned Policy Iteration (RCPI). RCPI builds on FT, but is guaranteed to converge to a saddle point. Our numerical results show that RCPI outperforms other convergent algorithms by several orders of magnitude.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Badger, K., Huang, J., & Petrik, M. (2026). Convergence of Fast Policy Iteration in Markov Games and Robust MDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(24), 19649–19656. https://doi.org/10.1609/aaai.v40i24.39045

Download Citation

Issue

Vol. 40 No. 24: AAAI-26 Technical Tracks 24

Section

AAAI Technical Track on Machine Learning I

Convergence of Fast Policy Iteration in Markov Games and Robust MDPs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information