PUMA: Planning Under Uncertainty with Macro-Actions

Authors

  • Ruijie He Massachusetts Institute of Technology
  • Emma Brunskill University of California, Berkeley
  • Nicholas Roy Massachusetts Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v24i1.7749

Keywords:

Planning under Uncertainty, POMDPs

Abstract

Planning in large, partially observable domains is challenging, especially when a long-horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that plan a different potential action for each future observation can be prohibitively expensive when planning many steps ahead. An efficient solution for planning far into the future in fully observable domains is to use temporally-extended sequences of actions, or "macro-actions." In this paper, we present a POMDP algorithm for planning under uncertainty with macro-actions (PUMA) that automatically constructs and evaluates open-loop macro-actions within forward-search planning, where the planner branches on observations only at the end of each macro-action. Additionally, we show how to incrementally refine the plan over time, resulting in an anytime algorithm that provably converges to an epsilon-optimal policy. In experiments on several large POMDP problems which require a long horizon lookahead, PUMA outperforms existing state-of-the art solvers.

Downloads

Published

2010-07-04

How to Cite

He, R., Brunskill, E., & Roy, N. (2010). PUMA: Planning Under Uncertainty with Macro-Actions. Proceedings of the AAAI Conference on Artificial Intelligence, 24(1), 1089-1095. https://doi.org/10.1609/aaai.v24i1.7749

Issue

Section

Reasoning about Plans, Processes and Actions