Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents


  • Jonathan Sorg University of Michigan
  • Satinder Singh University of Michigan
  • Richard Lewis University of Michigan


Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints on agent design: modify the reward function. In this work, we compare this reward design approach to the common leaf-evaluation heuristic approach for improving planning agents. We show that in many agents, the reward design approach strictly subsumes the leaf-evaluation approach, i.e., there exists a reward function for every leaf-evaluation heuristic that leads to equivalent behavior, but the converse is not true. We demonstrate that this generality leads to improved performance when an agent makes approximations in addition to the finite-horizon approximation. As part of our contribution, we extend PGRD, an online reward design algorithm, to develop reward design algorithms for Sparse Sampling and UCT, two algorithms capable of planning in large state spaces.




How to Cite

Sorg, J., Singh, S., & Lewis, R. (2011). Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1), 465-470. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/7931



AAAI Technical Track: Machine Learning