Informed Initial Policies for Learning in Dec-POMDPs

Landon Kraemer; Bikramjit Banerjee

doi:10.1609/aaai.v26i1.8426

Authors

Landon Kraemer The University of Southern Mississippi
Bikramjit Banerjee The University of Southern Mississippi

DOI:

https://doi.org/10.1609/aaai.v26i1.8426

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, and local information. Prevalent Dec-POMDP solution techniques have mostly been centralized and have assumed knowledge of the model. In real world scenarios, however, solving centrally may not be an option and model parameters maybe unknown. To address this, we propose a distributed, model-free algorithm for learning Dec-POMDP policies, in which agents take turns learning, with each agent not currently learning following a static policy. For agents that have not yet learned a policy, this static policy must be initialized. We propose a principled method for learning such initial policies through interaction with the environment. We show that by using such informed initial policies, our alternate learning algorithm can find near-optimal policies for two benchmark problems.

Informed Initial Policies for Learning in Dec-POMDPs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription