Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning

Authors

  • Liangyu Huo Beihang University
  • Zulin Wang Beihang University
  • Mai Xu Beihang University

DOI:

https://doi.org/10.1609/aaai.v37i7.25962

Keywords:

ML: Imitation Learning & Inverse Reinforcement Learning, ML: Reinforcement Learning Algorithms, ML: Learning Preferences or Rankings

Abstract

Imitation learning (IL) has recently shown impressive performance in training a reinforcement learning agent with human demonstrations, eliminating the difficulty of designing elaborate reward functions in complex environments. However, most IL methods work under the assumption of the optimality of the demonstrations and thus cannot learn policies to surpass the demonstrators. Some methods have been investigated to obtain better-than-demonstration (BD) performance with inner human feedback or preference labels. In this paper, we propose a method to learn rewards from suboptimal demonstrations via a weighted preference learning technique (LERP). Specifically, we first formulate the suboptimality of demonstrations as the inaccurate estimation of rewards. The inaccuracy is modeled with a reward noise random variable following the Gumbel distribution. Moreover, we derive an upper bound of the expected return with different noise coefficients and propose a theorem to surpass the demonstrations. Unlike existing literature, our analysis does not depend on the linear reward constraint. Consequently, we develop a BD model with a weighted preference learning technique. Experimental results on continuous control and high-dimensional discrete control tasks show the superiority of our LERP method over other state-of-the-art BD methods.

Downloads

Published

2023-06-26

How to Cite

Huo, L., Wang, Z., & Xu, M. (2023). Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 7953-7961. https://doi.org/10.1609/aaai.v37i7.25962

Issue

Section

AAAI Technical Track on Machine Learning II