Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization

Authors

  • Min Wang Beijing Institute of Technology
  • Xin Li Beijing Institute of Technology Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University
  • Mingzhong Wang University of the Sunshine Coast
  • Hasnaa Bennis Beijing Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i31.39845

Abstract

Offline meta-reinforcement learning (OMRL) combines the strengths of learning from diverse datasets in offline RL with the adaptability to new tasks of meta-RL, promising safe and efficient knowledge acquisition by RL agents. However, OMRL still suffers extrapolation errors due to out-of-distribution (OOD) actions, compromised by broad task distributions and Markov Decision Process (MDP) ambiguity in meta-RL setups. Existing research indicates that the generalization of the Q network affects the extrapolation error in offline RL. This paper investigates this relationship by decomposing the Q value into feature and weight components, observing that while decomposition enhances adaptability and convergence in the case of high-quality data, it often leads to policy degeneration or collapse in complex tasks. We observe that decomposed Q values introduce a large estimation bias when the feature encounters OOD samples, a phenomenon we term "feature overgeneralization''. To address this issue, we propose FLORA, which identifies OOD samples by modeling feature distributions and estimating their uncertainties. FLORA integrates a return feedback mechanism to adaptively adjust feature components. Furthermore, to learn precise task representations, FLORA explicitly models the complex task distribution using a chain of invertible transformations. We theoretically and empirically demonstrate that FLORA achieves rapid adaptation and meta-policy improvement compared to baselines across various environments.

Published

2026-03-14

How to Cite

Wang, M., Li, X., Wang, M., & Bennis, H. (2026). Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(31), 26390–26397. https://doi.org/10.1609/aaai.v40i31.39845

Issue

Section

AAAI Technical Track on Machine Learning VIII