Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations

Authors

  • Renzhe Zhou National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China
  • Chen-Xiao Gao National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China
  • Zongzhang Zhang National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China
  • Yang Yu National Key Laboratory for Novel Software Technology, Nanjing University, China School of Artificial Intelligence, Nanjing University, China

DOI:

https://doi.org/10.1609/aaai.v38i15.29658

Keywords:

ML: Reinforcement Learning, ML: Transfer, Domain Adaptation, Multi-Task Learning

Abstract

Generalization and sample efficiency have been long-standing issues concerning reinforcement learning, and thus the field of Offline Meta-Reinforcement Learning (OMRL) has gained increasing attention due to its potential of solving a wide range of problems with static and limited offline data. Existing OMRL methods often assume sufficient training tasks and data coverage to apply contrastive learning to extract task representations. However, such assumptions are not applicable in several real-world applications and thus undermine the generalization ability of the representations. In this paper, we consider OMRL with two types of data limitations: limited training tasks and limited behavior diversity and propose a novel algorithm called GENTLE for learning generalizable task representations in the face of data limitations. GENTLE employs Task Auto-Encoder (TAE), which is an encoder-decoder architecture to extract the characteristics of the tasks. Unlike existing methods, TAE is optimized solely by reconstruction of the state transition and reward, which captures the generative structure of the task models and produces generalizable representations when training tasks are limited. To alleviate the effect of limited behavior diversity, we consistently construct pseudo-transitions to align the data distribution used to train TAE with the data distribution encountered during testing. Empirically, GENTLE significantly outperforms existing OMRL methods on both in-distribution tasks and out-of-distribution tasks across both the given-context protocol and the one-shot protocol.

Published

2024-03-24

How to Cite

Zhou, R., Gao, C.-X., Zhang, Z., & Yu, Y. (2024). Generalizable Task Representation Learning for Offline Meta-Reinforcement Learning with Data Limitations. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 17132-17140. https://doi.org/10.1609/aaai.v38i15.29658

Issue

Section

AAAI Technical Track on Machine Learning VI