Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors

Authors

  • Andrey Kolobov University of Washington, Seattle
  • Peng Dai Google Inc.
  • Mausam Mausam University of Washington, Seattle
  • Daniel Weld University of Washington, Seattle

DOI:

https://doi.org/10.1609/icaps.v22i1.13523

Keywords:

MDP, planning under uncertainty, IPPC, iterative deepening

Abstract

In contrast to previous competitions, where the problems were goal-based, the 2011 International Probabilistic Planning Competition (IPPC-2011) emphasized finite-horizon reward maximization problems with large branching factors. These MDPs modeled more realistic planning scenarios and presented challenges to the previous state-of-the-art planners (e.g., those from IPPC-2008), which were primarily based on domain determinization — a technique more suited to goal-oriented MDPs with small branching factors. Moreover, large branching factors render the existing implementations of RTDP- and LAO-style algorithms inefficient as well. In this paper we present GLUTTON, our planner at IPPC-2011 that performed well on these challenging MDPs. The main algorithm used by GLUTTON is LR2TDP, an LRTDP-based optimal algorithm for finite-horizon problems centered around the novel idea of reverse iterative deepening. We detail LR2TDP itself as well as a series of optimizations included in GLUTTON that help LR2TDP achieve competitive performance on difficult problems with large branching factors -- subsampling the transition function, separating out natural dynamics, caching transition function samples, and others. Experiments show that GLUTTON and PROST, the IPPC-2011 winner, have complementary strengths, with GLUTTON demonstrating superior performance on problems with few high-reward terminal states.

Downloads

Published

2012-05-14

How to Cite

Kolobov, A., Dai, P., Mausam, M., & Weld, D. (2012). Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors. Proceedings of the International Conference on Automated Planning and Scheduling, 22(1), 146-154. https://doi.org/10.1609/icaps.v22i1.13523