Inferring Probabilistic Reward Machines from Non-Markovian Reward Signals for Reinforcement Learning

Taylor Dohmen; Noah Topper; George Atia; Andre Beckus; Ashutosh Trivedi; Alvaro Velasquez

doi:10.1609/icaps.v32i1.19844

Inferring Probabilistic Reward Machines from Non-Markovian Reward Signals for Reinforcement Learning

Authors

Taylor Dohmen University of Colorado Boulder
Noah Topper University of Central Florida
George Atia University of Central Florida
Andre Beckus Air Force Research Laboratory
Ashutosh Trivedi University of Colorado Boulder
Alvaro Velasquez Air Force Research Laboratory

DOI:

https://doi.org/10.1609/icaps.v32i1.19844

Keywords:

Reward Machines, Non-Markovian Rewards, Active Learning, Reinforcement Learning

Abstract

The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process and prove results around its correctness and convergence.

Downloads

Published

2022-06-13

How to Cite

Dohmen, T., Topper, N., Atia, G., Beckus, A., Trivedi, A., & Velasquez, A. (2022). Inferring Probabilistic Reward Machines from Non-Markovian Reward Signals for Reinforcement Learning. Proceedings of the International Conference on Automated Planning and Scheduling, 32(1), 574-582. https://doi.org/10.1609/icaps.v32i1.19844

Download Citation

Issue

Vol. 32 (2022): Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling

Section

Planning and Learning Track

Inferring Probabilistic Reward Machines from Non-Markovian Reward Signals for Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information