Best-Effort Policies for Robust Markov Decision Processes
DOI:
https://doi.org/10.1609/aaai.v40i43.40929Abstract
We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the worst-case expected return, we additionally require the policy to achieve a maximal expected return under different (i.e., not fully adversarial) transition probabilities. We call such a policy an optimal robust best-effort (ORBE) policy. We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a manageable overhead over standard robust value iteration. ORBE policies offer a principled tie-breaker among optimal robust policies. Numerical experiments show the feasibility of our approach.Downloads
Published
2026-03-14
How to Cite
Abate, A., Badings, T., De Giacomo, G., & Fabiano, F. (2026). Best-Effort Policies for Robust Markov Decision Processes. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36120–36128. https://doi.org/10.1609/aaai.v40i43.40929
Issue
Section
AAAI Technical Track on Planning, Routing, and Scheduling