Heuristic Search for Generalized Stochastic Shortest Path MDPs

Authors

  • Andrey Kolobov University of Washington, Seattle
  • Mausam Mausam University of Washington, Seattle
  • Daniel Weld University of Washington, Seattle
  • Hector Geffner ICREA and Universitat Pompeu Fabra

DOI:

https://doi.org/10.1609/icaps.v21i1.13452

Abstract

Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V* that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V*. This paper’s main contribution is the description of a new class of MDPs, that have well-defined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positive-bounded, negative, and discounted-reward models. While existing efficient heuristic search algorithms like LAO* and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.

Downloads

Published

2011-03-22

How to Cite

Kolobov, A., Mausam, M., Weld, D., & Geffner, H. (2011). Heuristic Search for Generalized Stochastic Shortest Path MDPs. Proceedings of the International Conference on Automated Planning and Scheduling, 21(1), 130-137. https://doi.org/10.1609/icaps.v21i1.13452