Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs

Authors

  • Bikramjit Banerjee The University of Southern Mississippi

DOI:

https://doi.org/10.1609/aaai.v27i1.8670

Keywords:

Decentralized Planning, Reinforcement Learning, Monte Carlo Tree Search

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. Recently a Monte Carlo based distributed reinforcement learning approach was proposed, where agents take turns to learn best responses to each other’s policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. However, this Monte Carlo approach has a large sample complexity, which we address in this paper. In particular, we propose and analyze a modified version of the previous algorithm that adaptively eliminates parts of the experience tree from further exploration, thus requiring fewer samples while ensuring unchanged confidence in the learned value function. Experiments demonstrate significant reduction in sample complexity – the maximum reductions ranging from 61% to 91% over different benchmark Dec-POMDP problems – with the final policies being often better due to more focused exploration.

Downloads

Published

2013-06-30

How to Cite

Banerjee, B. (2013). Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 27(1), 88-94. https://doi.org/10.1609/aaai.v27i1.8670