Multi-Agent Tree Search with Dynamic Reward Shaping

Alvaro Velasquez; Brett Bissey; Lior Barak; Daniel Melcer; Andre Beckus; Ismail Alkhouri; George Atia

doi:10.1609/icaps.v32i1.19854

Authors

Alvaro Velasquez Air Force Research Laboratory
Brett Bissey MITRE
Lior Barak University of Central Florida
Daniel Melcer Northeastern University
Andre Beckus Air Force Research Laboratory
Ismail Alkhouri University of Central Florida
George Atia University of Central Florida

DOI:

https://doi.org/10.1609/icaps.v32i1.19854

Keywords:

Multi-Agent Planning, Multi-Agent Reinforcement Learning, Planning And Learning, Reward Shaping, Monte Carlo Tree Search, Mixed Cooperative-Competitive, Emergent Behavior, Linear Temporal Logic, Non-Markovian Reinforcement Learning

Abstract

Sparse rewards and their representation in multi-agent domains remains a challenge for the development of multi-agent planning systems. While techniques from formal methods can be adopted to represent the underlying planning objectives, their use in facilitating and accelerating learning has witnessed limited attention in multi-agent settings. Reward shaping methods that leverage such formal representations in single-agent settings are typically static in the sense that the artificial rewards remain the same throughout the entire learning process. In contrast, we investigate the use of such formal objective representations to define novel reward shaping functions that capture the learned experience of the agents. More specifically, we leverage the automaton representation of the underlying team objectives in mixed cooperative-competitive domains such that each automaton transition is assigned an expected value proportional to the frequency with which it was observed in successful trajectories of past behavior. This form of dynamic reward shaping is proposed within a multi-agent tree search architecture wherein agents can simultaneously reason about the future behavior of other agents as well as their own future behavior.

Multi-Agent Tree Search with Dynamic Reward Shaping

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information