Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Shilong Deng; Zetao Zheng; Hongcai He; Paul Weng; Jie Shao

doi:10.1609/aaai.v39i15.33784

Authors

Shilong Deng University of Electronic Science and Technology of China, Chengdu, China
Zetao Zheng University of Electronic Science and Technology of China, Chengdu, China Sichuan Artificial Intelligence Research Institute, Yibin, China
Hongcai He University of Electronic Science and Technology of China, Chengdu, China
Paul Weng Data Science Research Center, Duke Kunshan University, Kunshan, China
Jie Shao University of Electronic Science and Technology of China, Chengdu, China Sichuan Artificial Intelligence Research Institute, Yibin, China

DOI:

https://doi.org/10.1609/aaai.v39i15.33784

Abstract

A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information