Improving Zero-Shot Coordination Performance Based on Policy Similarity

Authors

  • Lebin Yu Tsinghua University
  • Yunbo Qiu Tsinghua University
  • Quanming Yao Tsinghua University
  • Xudong Zhang Tsinghua University
  • Jian Wang Tsinghua University

DOI:

https://doi.org/10.1609/icaps.v33i1.27223

Keywords:

Multi-agent and distributed planning, Partially observable and unobservable domains, Learning for planning and scheduling

Abstract

Over these years, multi-agent reinforcement learning have achieved remarkable performance in multi-agent planning and scheduling tasks. It typically follows the self-play setting, where agents are trained by playing with a fixed group of agents. However, in the face of zero-shot coordination, where an agent must coordinate with unseen partners, self-play agents may fail. Several methods have been proposed to handle this problem, but they either take a lot of time or lack generalizability. In this paper, we firstly reveal an important phenomenon: the zero-shot coordination performance is strongly linearly correlated with the similarity between an agent's training partner and testing partner. Inspired by it, we put forward a Similarity-Based Robust Training (SBRT) scheme that improves agents' zero-shot coordination performance by disturbing their partners' actions during training according to a pre-defined policy similarity value. To validate its effectiveness, we apply our scheme to three multi-agent reinforcement learning frameworks and achieve better performance compared with previous methods.

Downloads

Published

2023-07-01

How to Cite

Yu, L., Qiu, Y., Yao, Q., Zhang, X., & Wang, J. (2023). Improving Zero-Shot Coordination Performance Based on Policy Similarity. Proceedings of the International Conference on Automated Planning and Scheduling, 33(1), 438-442. https://doi.org/10.1609/icaps.v33i1.27223