Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping

Alberto Camacho; Oscar Chen; Scott Sanner; Sheila McIlraith

doi:10.1609/socs.v8i1.18421

Authors

Alberto Camacho University of Toronto
Oscar Chen University of Cambridge
Scott Sanner University of Toronto
Sheila McIlraith University of Toronto

DOI:

https://doi.org/10.1609/socs.v8i1.18421

Abstract

We propose an approach to solving Markov Decision Processes with non-Markovian rewards specified in Linear Temporal Logic interpreted over finite traces (LTL-f). Our approach integrates automata representations of LTL-f formulae into compiled MDPs that can be solved by off-the-shelf MDP planners, exploiting reward shaping to help guide search. Experiments with state-of-the-art UCT-based MDP planner PROST show automata-based reward shaping to be an effective method to guide search, producing solutions of superior quality, while maintaining policy optimality guarantees.

Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping

Authors

DOI:

Abstract

Downloads

Published

Issue

Section