LTLf/LDLf Non-Markovian Rewards

Ronen Brafman; Giuseppe De Giacomo; Fabio Patrizi

doi:10.1609/aaai.v32i1.11572

Authors

Ronen Brafman Ben-Gurion University
Giuseppe De Giacomo Sapienza University of Rome
Fabio Patrizi Sapienza University of Rome

DOI:

https://doi.org/10.1609/aaai.v32i1.11572

Keywords:

MDPs, non-Markovian Rewards, LTLf/LDLf

Abstract

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDLf for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

LTLf/LDLf Non-Markovian Rewards

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription