Sequential Generative Exploration Model for Partially Observable Reinforcement Learning

Authors

  • Haiyan Yin Nanyang Technological University, Singapore
  • Jianda Chen Nanyang Technological University, Singapore
  • Sinno Jialin Pan Nanyang Technological University, Singapore
  • Sebastian Tschiatschek University of Vienna, Austria

Keywords:

Reinforcement Learning

Abstract

Many challenging partially observable reinforcement learning problems have sparse rewards and most existing model-free algorithms struggle with such reward sparsity. In this paper, we propose a novel reward shaping approach to infer the intrinsic rewards for the agent from a sequential generative model. Specifically, the sequential generative model processes a sequence of partial observations and actions from the agent's historical transitions to compile a belief state for performing forward dynamics prediction. Then we utilize the error of the dynamics prediction task to infer the intrinsic rewards for the agent. Our proposed method is able to derive intrinsic rewards that could better reflect the agent's surprise or curiosity over its ground-truth state by taking a sequential inference procedure. Furthermore, we formulate the inference procedure for dynamics prediction as a multi-step forward prediction task, where the time abstraction that has been incorporated could effectively help to increase the expressiveness of the intrinsic reward signals. To evaluate our method, we conduct extensive experiments on challenging 3D navigation tasks in ViZDoom and DeepMind Lab. Empirical evaluation results show that our proposed exploration method could lead to significantly faster convergence than various state-of-the-art exploration approaches in the testified navigation domains.

Downloads

Published

2021-05-18

How to Cite

Yin, H., Chen, J., Pan, S. J., & Tschiatschek, S. (2021). Sequential Generative Exploration Model for Partially Observable Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10700-10708. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17279

Issue

Section

AAAI Technical Track on Machine Learning V