Improving Zero-Shot Coordination Performance Based on Policy Similarity

Lebin Yu; Yunbo Qiu; Quanming Yao; Xudong Zhang; Jian Wang

doi:10.1609/icaps.v33i1.27223

Authors

Lebin Yu Tsinghua University
Yunbo Qiu Tsinghua University
Quanming Yao Tsinghua University
Xudong Zhang Tsinghua University
Jian Wang Tsinghua University

DOI:

https://doi.org/10.1609/icaps.v33i1.27223

Keywords:

Multi-agent and distributed planning, Partially observable and unobservable domains, Learning for planning and scheduling

Abstract

Over these years, multi-agent reinforcement learning have achieved remarkable performance in multi-agent planning and scheduling tasks. It typically follows the self-play setting, where agents are trained by playing with a fixed group of agents. However, in the face of zero-shot coordination, where an agent must coordinate with unseen partners, self-play agents may fail. Several methods have been proposed to handle this problem, but they either take a lot of time or lack generalizability. In this paper, we firstly reveal an important phenomenon: the zero-shot coordination performance is strongly linearly correlated with the similarity between an agent's training partner and testing partner. Inspired by it, we put forward a Similarity-Based Robust Training (SBRT) scheme that improves agents' zero-shot coordination performance by disturbing their partners' actions during training according to a pre-defined policy similarity value. To validate its effectiveness, we apply our scheme to three multi-agent reinforcement learning frameworks and achieve better performance compared with previous methods.

Improving Zero-Shot Coordination Performance Based on Policy Similarity

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information