Improving Generalization in Offline Reinforcement Learning via Latent Distribution Representation Learning
DOI:
https://doi.org/10.1609/aaai.v39i20.35402Abstract
Dealing with the distribution shift is a significant challenge when building offline reinforcement learning (RL) models that can generalize from a static dataset to out-of-distribution (OOD) scenarios. Previous approaches have employed pessimism or conservatism strategies. More recently, data-driven work has taken a distributional perspective, treating offline data as a domain adaptation problem. However, these methods use heuristic techniques to simulate distribution shifts, resulting in a limited diversity of artificially created distribution gaps. In this paper, we propose a novel perspective: offline datasets inherently contain multiple latent distributions, with behavior data from diverse policies potentially following different distributions and data from the same policy across various time phases also exhibiting distribution variance. We introduce the Latent Distribution Representation Learning (LAD) framework, which aims to characterize the multiple latent distributions within offline data and reduce the distribution gaps between any pair of them. LAD consists of a min-max adversarial process: it first identifies the "worst-case" distributions to enlarge the diversity of distribution gaps and then reduces these gaps to learn invariant representations for generalization. We derive a generalization error bound to support LAD theoretically and verify its effectiveness through extensive experiments.Downloads
Published
2025-04-11
How to Cite
Wang, D., Li, L., Wei, W., Yu, Q., Hao, J., & Liang, J. (2025). Improving Generalization in Offline Reinforcement Learning via Latent Distribution Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(20), 21053–21061. https://doi.org/10.1609/aaai.v39i20.35402
Issue
Section
AAAI Technical Track on Machine Learning VI