Learning Task-Distribution Reward Shaping with Meta-Learning

Haosheng Zou; Tongzheng Ren; Dong Yan; Hang Su; Jun Zhu

doi:10.1609/aaai.v35i12.17337

Authors

Haosheng Zou Tsinghua University
Tongzheng Ren UT Austin
Dong Yan Tsinghua University
Hang Su Tsinghua Univiersity
Jun Zhu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v35i12.17337

Keywords:

Reinforcement Learning, Transfer/Adaptation/Multi-task/Meta/Automated Learning, (Deep) Neural Network Algorithms, Games

Abstract

Reward shaping is one of the most effective methods to tackle the crucial yet challenging problem of credit assignment and accelerate Reinforcement Learning. However, designing shaping functions usually requires rich expert knowledge and hand-engineering, and the difficulties are further exacerbated given multiple tasks to solve. In this paper, we consider reward shaping on a distribution of tasks that share state spaces but not necessarily action spaces. We provide insights into optimal reward shaping, and propose a novel meta-learning framework to automatically learn such reward shaping to apply on newly sampled tasks. Theoretical analysis and extensive experiments establish us as the state-of-the-art in learning task-distribution reward shaping, outperforming previous such works (Konidaris and Barto 2006; Snel and Whiteson 2014). We further show that our method outperforms learning intrinsic rewards (Yang et al. 2019; Zheng et al. 2020), outperforms Rainbow (Hessel et al. 2018) in complex pixel-based CoinRun games, and is also better than hand-designed reward shaping on grids. While the goal of this paper is to learn reward shaping rather than to propose new general meta-learning algorithms as PEARL (Rakelly et al. 2019) or MQL (Fakoor et al. 2020), our framework based on MAML (Finn, Abbeel, and Levine 2017) also outperforms PEARL / MQL, and could combine with them for further improvement.

Learning Task-Distribution Reward Shaping with Meta-Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription