Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Authors

  • Shilong Deng University of Electronic Science and Technology of China, Chengdu, China
  • Zetao Zheng University of Electronic Science and Technology of China, Chengdu, China Sichuan Artificial Intelligence Research Institute, Yibin, China
  • Hongcai He University of Electronic Science and Technology of China, Chengdu, China
  • Paul Weng Data Science Research Center, Duke Kunshan University, Kunshan, China
  • Jie Shao University of Electronic Science and Technology of China, Chengdu, China Sichuan Artificial Intelligence Research Institute, Yibin, China

DOI:

https://doi.org/10.1609/aaai.v39i15.33784

Abstract

A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.

Downloads

Published

2025-04-11

How to Cite

Deng, S., Zheng, Z., He, H., Weng, P., & Shao, J. (2025). Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data. Proceedings of the AAAI Conference on Artificial Intelligence, 39(15), 16244–16252. https://doi.org/10.1609/aaai.v39i15.33784

Issue

Section

AAAI Technical Track on Machine Learning I