Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents

Jonathan Sorg; Satinder Singh; Richard Lewis

doi:10.1609/aaai.v25i1.7931

Authors

Jonathan Sorg University of Michigan
Satinder Singh University of Michigan
Richard Lewis University of Michigan

DOI:

https://doi.org/10.1609/aaai.v25i1.7931

Abstract

Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this finite-horizon approximation by applying an evaluation function at the leaf-states of the planning tree. Recent work has proposed an alternative approach for overcoming computational constraints on agent design: modify the reward function. In this work, we compare this reward design approach to the common leaf-evaluation heuristic approach for improving planning agents. We show that in many agents, the reward design approach strictly subsumes the leaf-evaluation approach, i.e., there exists a reward function for every leaf-evaluation heuristic that leads to equivalent behavior, but the converse is not true. We demonstrate that this generality leads to improved performance when an agent makes approximations in addition to the finite-horizon approximation. As part of our contribution, we extend PGRD, an online reward design algorithm, to develop reward design algorithms for Sparse Sampling and UCT, two algorithms capable of planning in large state spaces.

Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information