Learning Diverse Risk Preferences in Population-Based Self-Play

Authors

  • Yuhua Jiang Department of Automation, Tsinghua University
  • Qihan Liu Department of Automation, Tsinghua University
  • Xiaoteng Ma Department of Automation, Tsinghua University
  • Chenghao Li Department of Automation, Tsinghua University
  • Yiqin Yang Department of Automation, Tsinghua University
  • Jun Yang Department of Automation, Tsinghua University
  • Bin Liang Department of Automation, Tsinghua University
  • Qianchuan Zhao Department of Automation, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v38i11.29188

Keywords:

ML: Reinforcement Learning, MAS: Adversarial Agents

Abstract

Among the remarkable successes of Reinforcement Learning (RL), self-play algorithms have played a crucial role in solving competitive games. However, current self-play RL methods commonly optimize the agent to maximize the expected win-rates against its current or historical copies, resulting in a limited strategy style and a tendency to get stuck in local optima. To address this limitation, it is important to improve the diversity of policies, allowing the agent to break stalemates and enhance its robustness when facing with different opponents. In this paper, we present a novel perspective to promote diversity by considering that agents could have diverse risk preferences in the face of uncertainty. To achieve this, we introduce a novel reinforcement learning algorithm called Risk-sensitive Proximal Policy Optimization (RPPO), which smoothly interpolates between worst-case and best-case policy learning, enabling policy learning with desired risk preferences. Furthermore, by seamlessly integrating RPPO with population-based self-play, agents in the population optimize dynamic risk-sensitive objectives using experiences gained from playing against diverse opponents. Our empirical results demonstrate that our method achieves comparable or superior performance in competitive games and, importantly, leads to the emergence of diverse behavioral modes. Code is available at https://github.com/Jackory/RPBT.

Downloads

Published

2024-03-24

How to Cite

Jiang, Y., Liu, Q., Ma, X., Li, C., Yang, Y., Yang, J., Liang, B., & Zhao, Q. (2024). Learning Diverse Risk Preferences in Population-Based Self-Play. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12910-12918. https://doi.org/10.1609/aaai.v38i11.29188

Issue

Section

AAAI Technical Track on Machine Learning II