Advice-Guided Reinforcement Learning in a non-Markovian Environment

Authors

  • Daniel Neider Max Planck Institute for Software Systems
  • Jean-Raphael Gaglione Ecole Polytechnique
  • Ivan Gavran Max Planck Institute for Software Systems
  • Ufuk Topcu University of Texas at Austin
  • Bo Wu University of Texas at Austin
  • Zhe Xu Arizona State University

DOI:

https://doi.org/10.1609/aaai.v35i10.17096

Keywords:

Reinforcement Learning, Neuro-Symbolic AI (NSAI), Applications, Human-in-the-loop Machine Learning

Abstract

We study a class of reinforcement learning tasks in which the agent receives its reward for complex, temporally-extended behaviors sparsely. For such tasks, the problem is how to augment the state-space so as to make the reward function Markovian in an efficient way. While some existing solutions assume that the reward function is explicitly provided to the learning algorithm (e.g., in the form of a reward machine), the others learn the reward function from the interactions with the environment, assuming no prior knowledge provided by the user. In this paper, we generalize both approaches and enable the user to give advice to the agent, representing the user’s best knowledge about the reward function, potentially fragmented, partial, or even incorrect. We formalize advice as a set of DFAs and present a reinforcement learning algorithm that takes advantage of such advice, with optimal con- vergence guarantee. The experiments show that using well- chosen advice can reduce the number of training steps needed for convergence to optimal policy, and can decrease the computation time to learn the reward function by up to two orders of magnitude.

Downloads

Published

2021-05-18

How to Cite

Neider, D., Gaglione, J.-R., Gavran, I., Topcu, U., Wu, B., & Xu, Z. (2021). Advice-Guided Reinforcement Learning in a non-Markovian Environment. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10), 9073-9080. https://doi.org/10.1609/aaai.v35i10.17096

Issue

Section

AAAI Technical Track on Machine Learning III