Thompson Sampling Based Monte-Carlo Planning in POMDPs

Aijun Bai; Feng Wu; Zongzhang Zhang; Xiaoping Chen

doi:10.1609/icaps.v24i1.13616

Authors

Aijun Bai University of Science and Technology of China
Feng Wu University of Southampton
Zongzhang Zhang National University of Singapore
Xiaoping Chen University of Science and Technology of China

DOI:

https://doi.org/10.1609/icaps.v24i1.13616

Keywords:

Planning under uncertainty, Monte-carlo planning, POMDP, MCTS

Abstract

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D²NG-POMCP) treats the accumulated reward of performing an action from a belief state in the MCTS search tree as a random variable following an unknown distribution with hidden parameters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thompson sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.

Thompson Sampling Based Monte-Carlo Planning in POMDPs

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information