Best-Effort Policies for Robust Markov Decision Processes

Alessandro Abate; Thom Badings; Giuseppe De Giacomo; Francesco Fabiano

doi:10.1609/aaai.v40i43.40929

Authors

Alessandro Abate University of Oxford
Thom Badings University of Oxford
Giuseppe De Giacomo University of Oxford
Francesco Fabiano University of Oxford

DOI:

https://doi.org/10.1609/aaai.v40i43.40929

Abstract

We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the worst-case expected return, we additionally require the policy to achieve a maximal expected return under different (i.e., not fully adversarial) transition probabilities. We call such a policy an optimal robust best-effort (ORBE) policy. We prove that ORBE policies always exist, characterize their structure, and present an algorithm to compute them with a manageable overhead over standard robust value iteration. ORBE policies offer a principled tie-breaker among optimal robust policies. Numerical experiments show the feasibility of our approach.

Best-Effort Policies for Robust Markov Decision Processes

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information