Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

Rui Zhao; Jinming Song; Yufeng Yuan; Haifeng Hu; Yang Gao; Yi Wu; Zhongqian Sun; Wei Yang

doi:10.1609/aaai.v37i5.25758

Authors

Rui Zhao Tencent AI Lab
Jinming Song Tencent AI Lab
Yufeng Yuan Tencent AI Lab
Haifeng Hu Tencent AI Lab
Yang Gao Tsinghua University
Yi Wu Tsinghua University
Zhongqian Sun Tencent AI Lab
Wei Yang Tencent AI Lab

DOI:

https://doi.org/10.1609/aaai.v37i5.25758

Keywords:

HAI: Games, Virtual Humans, and Autonomous Characters, ML: Reinforcement Learning Algorithms, MAS: Coordination and Collaboration

Abstract

We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using human data. Although such agents can be obtained through self-play training, they can suffer significantly from the distributional shift when paired with unencountered partners, such as humans. In this paper, we propose Maximum Entropy Population-based training (MEP) to mitigate such distributional shift. In MEP, agents in the population are trained with our derived Population Entropy bonus to promote the pairwise diversity between agents and the individual diversity of agents themselves. After obtaining this diversified population, a common best agent is trained by paring with agents in this population via prioritized sampling, where the prioritization is dynamically adjusted based on the training progress. We demonstrate the effectiveness of our method MEP, with comparison to Self-Play PPO (SP), Population-Based Training (PBT), Trajectory Diversity (TrajeDi), and Fictitious Co-Play (FCP) in both matrix game and Overcooked game environments, with partners being human proxy models and real humans. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription