Solving Transition-Independent Multi-Agent MDPs with Sparse Interactions

Joris Scharpff; Diederik Roijers; Frans Oliehoek; Matthijs Spaan; Mathijs de Weerdt

doi:10.1609/aaai.v30i1.10405

Authors

Joris Scharpff Delft University of Technology
Diederik Roijers University of Amsterdam
Frans Oliehoek University of Amsterdam and University of Liverpool
Matthijs Spaan Delft University of Technology
Mathijs de Weerdt Delft University of Technology

DOI:

https://doi.org/10.1609/aaai.v30i1.10405

Keywords:

Markov Decision Process, Transition-independent Multi-agent MDPs, Reward interactions, Conditional Return Graphs

Abstract

In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than available alternatives and finds solutions to previously unsolvable problems.

Solving Transition-Independent Multi-Agent MDPs with Sparse Interactions

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription