Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination


  • Rui Zhao Tencent AI Lab
  • Jinming Song Tencent AI Lab
  • Yufeng Yuan Tencent AI Lab
  • Haifeng Hu Tencent AI Lab
  • Yang Gao Tsinghua University
  • Yi Wu Tsinghua University
  • Zhongqian Sun Tencent AI Lab
  • Wei Yang Tencent AI Lab



HAI: Games, Virtual Humans, and Autonomous Characters, ML: Reinforcement Learning Algorithms, MAS: Coordination and Collaboration


We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using human data. Although such agents can be obtained through self-play training, they can suffer significantly from the distributional shift when paired with unencountered partners, such as humans. In this paper, we propose Maximum Entropy Population-based training (MEP) to mitigate such distributional shift. In MEP, agents in the population are trained with our derived Population Entropy bonus to promote the pairwise diversity between agents and the individual diversity of agents themselves. After obtaining this diversified population, a common best agent is trained by paring with agents in this population via prioritized sampling, where the prioritization is dynamically adjusted based on the training progress. We demonstrate the effectiveness of our method MEP, with comparison to Self-Play PPO (SP), Population-Based Training (PBT), Trajectory Diversity (TrajeDi), and Fictitious Co-Play (FCP) in both matrix game and Overcooked game environments, with partners being human proxy models and real humans. A supplementary video showing experimental results is available at




How to Cite

Zhao, R., Song, J., Yuan, Y., Hu, H., Gao, Y., Wu, Y., Sun, Z., & Yang, W. (2023). Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5), 6145-6153.



AAAI Technical Track on Humans and AI