Multiagent Learning with a Noisy Global Reward Signal

Scott Proper; Kagan Tumer

doi:10.1609/aaai.v27i1.8580

Authors

Scott Proper Oregon State University
Kagan Tumer Oregon State University

DOI:

https://doi.org/10.1609/aaai.v27i1.8580

Keywords:

Multiagent Learning, Congestion Problems, Difference Rewards

Abstract

Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instance of reward shaping) have been used to alleviate this concern, but they remain difficult to compute in many domains. In this paper we present an approach to modeling the global reward using function approximation that allows the quick computation of local rewards. We demonstrate how this model can result in significant improvements in behavior for three congestion problems: a multiagent ``bar problem'', a complex simulation of the United States airspace, and a generic air traffic domain. We show how the model of the global reward may be either learned on- or off-line using either linear functions or neural networks. For the bar problem, we show an increase in reward of nearly 200% over learning using the global reward directly. For the air traffic problem, we show a decrease in costs of 25% over learning using the global reward directly.

Multiagent Learning with a Noisy Global Reward Signal

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information