An Approximate Bayesian Reinforcement Learning Approach Using Robust Control Policy and Tree Search
Keywords:model-based Bayesian reinforcement learning
For autonomous robots, we propose an approximate model-based Bayesian reinforcement learning (MB-BRL) approach that reduces real-world samples within feasible computational efforts. Firstly, to find an approximate solution of an original undiscounted infinite horizon MB-BRL problem with a cost-free termination, we consider a finite horizon (FH) MB-BRL problem in which terminal costs are given by robust control policies. The resulting performance is better than or equal to the performance obtained with a robust method, while the resulting policy may choose an explorative behavior to get useful information about parametric model uncertainty for reducing real-world samples. Secondly, to obtain a feasible solution of the FH MB-BRL problem using simulation samples, we propose a combination of robust RL, Monte Carlo tree search (MCTS), and Bayesian inference. We show an idea of reusing previous MCTS samples for Bayesian inference at a leaf node. The proposed approach allows an agent to choose from multiple robust policies at a leaf node. Numerical experiments of a two-dimensional peg-in-hole task demonstrate the effectiveness of the proposed approach.