Deep Recurrent Belief Propagation Network for POMDPs

Authors

  • Yuhui Wang College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics MIIT Key Laboratory of Pattern Analysis and Machine Intelligence
  • Xiaoyang Tan College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics MIIT Key Laboratory of Pattern Analysis and Machine Intelligence

DOI:

https://doi.org/10.1609/aaai.v35i11.17227

Keywords:

Reinforcement Learning, Planning with Markov Models (MDPs, POMDPs), Planning under Uncertainty

Abstract

In many real-world sequential decision-making tasks, especially in continuous control like robotic control, it is rare that the observations are perfect, that is, the sensory data could be incomplete, noisy or even dynamically polluted due to the unexpected malfunctions or intrinsic low quality of the sensors. Previous methods handle these issues in the framework of POMDPs and are either deterministic by feature memorization or stochastic by belief inference. In this paper, we present a new method that lies somewhere in the middle of the spectrum of research methodology identified above and combines the strength of both approaches. In particular, the proposed method, named Deep Recurrent Belief Propagation Network (DRBPN), takes a hybrid style belief updating procedure − an RNN-type feature extraction step followed by an analytical belief inference, significantly reducing the computational cost while faithfully capturing the complex dynamics and maintaining the necessary uncertainty for generalization. The effectiveness of the proposed method is verified on a collection of benchmark tasks, showing that our approach outperforms several state-of-the-art methods under various challenging scenarios.

Downloads

Published

2021-05-18

How to Cite

Wang, Y., & Tan, X. (2021). Deep Recurrent Belief Propagation Network for POMDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11), 10236-10244. https://doi.org/10.1609/aaai.v35i11.17227

Issue

Section

AAAI Technical Track on Machine Learning IV