Learning Task-Distribution Reward Shaping with Meta-Learning


  • Haosheng Zou Tsinghua University
  • Tongzheng Ren UT Austin
  • Dong Yan Tsinghua University
  • Hang Su Tsinghua Univiersity
  • Jun Zhu Tsinghua University


Reinforcement Learning, Transfer/Adaptation/Multi-task/Meta/Automated Learning, (Deep) Neural Network Algorithms, Games


Reward shaping is one of the most effective methods to tackle the crucial yet challenging problem of credit assignment and accelerate Reinforcement Learning. However, designing shaping functions usually requires rich expert knowledge and hand-engineering, and the difficulties are further exacerbated given multiple tasks to solve. In this paper, we consider reward shaping on a distribution of tasks that share state spaces but not necessarily action spaces. We provide insights into optimal reward shaping, and propose a novel meta-learning framework to automatically learn such reward shaping to apply on newly sampled tasks. Theoretical analysis and extensive experiments establish us as the state-of-the-art in learning task-distribution reward shaping, outperforming previous such works (Konidaris and Barto 2006; Snel and Whiteson 2014). We further show that our method outperforms learning intrinsic rewards (Yang et al. 2019; Zheng et al. 2020), outperforms Rainbow (Hessel et al. 2018) in complex pixel-based CoinRun games, and is also better than hand-designed reward shaping on grids. While the goal of this paper is to learn reward shaping rather than to propose new general meta-learning algorithms as PEARL (Rakelly et al. 2019) or MQL (Fakoor et al. 2020), our framework based on MAML (Finn, Abbeel, and Levine 2017) also outperforms PEARL / MQL, and could combine with them for further improvement.




How to Cite

Zou, H., Ren, T., Yan, D., Su, H., & Zhu, J. (2021). Learning Task-Distribution Reward Shaping with Meta-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 11210-11218. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/17337



AAAI Technical Track on Machine Learning V