LRTDP Versus UCT for Online Probabilistic Planning

Authors

  • Andrey Kolobov University of Washington, Seattle
  • . Mausam University of Washington, Seattle
  • Daniel Weld University of Washington, Seattle

DOI:

https://doi.org/10.1609/aaai.v26i1.8362

Keywords:

Markov Decision Process, UCT, LRTDP, Online planning

Abstract

UCT, the premier method for solving games such as Go, is also becoming the dominant algorithm for probabilistic planning. Out of the five solvers at the International Probabilistic Planning Competition (IPPC) 2011, four were based on the UCT algorithm. However, while a UCT-based planner, PROST, won the contest, an LRTDP-based system, Glutton, came in a close second, outperforming other systems derived from UCT. These results raise a question: what are the strengths and weaknesses of LRTDP and UCT in practice? This paper starts answering this question by contrasting the two approaches in the context of finite-horizon MDPs. We demonstrate that in such scenarios, UCT's lack of a sound termination condition is a serious practical disadvantage. In order to handle an MDP with a large finite horizon under a time constraint, UCT forces an expert to guess a non-myopic lookahead value for which it should be able to converge on the encountered states. Mistakes in setting this parameter can greatly hurt UCT's performance. In contrast, LRTDP's convergence criterion allows for an iterative deepening strategy. Using this strategy, LRTDP automatically finds the largest lookahead value feasible under the given time constraint. As a result, LRTDP has better performance and stronger theoretical properties. We present an online version of Glutton, named Gourmand, that illustrates this analysis and outperforms PROST on the set of IPPC-2011 problems.

Downloads

Published

2021-09-20

How to Cite

Kolobov, A., Mausam, ., & Weld, D. (2021). LRTDP Versus UCT for Online Probabilistic Planning. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 1786-1792. https://doi.org/10.1609/aaai.v26i1.8362

Issue

Section

Reasoning about Plans, Processes and Actions