Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs

Authors

  • Chongjie Zhang University of Massachusetts Amherst
  • Victor Lesser University of Massachusetts Amherst

DOI:

https://doi.org/10.1609/aaai.v25i1.7886

Abstract

In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in practice. This paper presents a model-free, scalable learning approach that synthesizes multi-agent reinforcement learning (MARL) and distributed constraint optimization (DCOP). By exploiting structured interaction in ND-POMDPs, our approach distributes the learning of the joint policy and employs DCOP techniques to coordinate distributed learning to ensure the global learning performance. Our approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability. Experimental results show that, with communication during learning and execution, our approach significantly outperforms the nearly-optimal non-communication policies computed offline.

Downloads

Published

2011-08-04

How to Cite

Zhang, C., & Lesser, V. (2011). Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1), 764-770. https://doi.org/10.1609/aaai.v25i1.7886

Issue

Section

AAAI Technical Track: Multiagent Systems