Learning to Shape Rewards Using a Game of Two Partners

Authors

  • David Mguni Huawei
  • Taher Jafferjee Huawei
  • Jianhong Wang University of Manchester
  • Nicolas Perez-Nieves Imperial College London
  • Wenbin Song ShanghaiTech University
  • Feifei Tong Huawei
  • Matthew Taylor University of Alberta Alberta Machine Intelligence Institute
  • Tianpei Yang University of Alberta Alberta Machine Intelligence Institute
  • Zipeng Dai Huawei
  • Hui Chen UCL
  • Jiangcheng Zhu Huawei
  • Kun Shao Huawei
  • Jun Wang UCL
  • Yaodong Yang Peking University

DOI:

https://doi.org/10.1609/aaai.v37i10.26371

Keywords:

MAS: Multiagent Learning, MAS: Coordination and Collaboration, MAS: Distributed Problem Solving

Abstract

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construc- tion is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA’s properties in three didactic experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments.

Downloads

Published

2023-06-26

How to Cite

Mguni, D., Jafferjee, T., Wang, J., Perez-Nieves, N., Song, W., Tong, F., Taylor, M., Yang, T., Dai, Z., Chen, H., Zhu, J., Shao, K., Wang, J., & Yang, Y. (2023). Learning to Shape Rewards Using a Game of Two Partners. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10), 11604-11612. https://doi.org/10.1609/aaai.v37i10.26371

Issue

Section

AAAI Technical Track on Multiagent Systems