Occupation Measure Heuristics for Probabilistic Planning
For the past 25 years, heuristic search has been used to solve domain-independent probabilistic planning problems, but with heuristics that determinise the problem and ignore precious probabilistic information. To remedy this situation, we explore the use of occupation measures, which represent the expected number of times a given action will be executed in a given state of a policy. By relaxing the well-known linear program that computes them, we derive occupation measure heuristics -- the first admissible heuristics for stochastic shortest path problems (SSPs) taking probabilities into account. We show that these heuristics can also be obtained by extending recent operator-counting heuristic formulations used in deterministic planning. Since the heuristics are formulated as linear programs over occupation measures, they can easily be extended to more complex probabilistic planning models, such as constrained SSPs (C-SSPs). Moreover, their formulation can be tightly integrated into i-dual, a recent LP-based heuristic search algorithm for (constrained) SSPs, resulting in a novel probabilistic planning approach in which policy update and heuristic computation work in unison. Our experiments in several domains demonstrate the benefits of these new heuristics and approach.