Improving Generalization in Offline Reinforcement Learning via Latent Distribution Representation Learning

Authors

  • Da Wang Shanxi University
  • Lin Li Shanxi University
  • Wei Wei Shanxi University
  • Qixian Yu Shanxi University
  • Jianye Hao Tianjin University
  • Jiye Liang Shanxi University

DOI:

https://doi.org/10.1609/aaai.v39i20.35402

Abstract

Dealing with the distribution shift is a significant challenge when building offline reinforcement learning (RL) models that can generalize from a static dataset to out-of-distribution (OOD) scenarios. Previous approaches have employed pessimism or conservatism strategies. More recently, data-driven work has taken a distributional perspective, treating offline data as a domain adaptation problem. However, these methods use heuristic techniques to simulate distribution shifts, resulting in a limited diversity of artificially created distribution gaps. In this paper, we propose a novel perspective: offline datasets inherently contain multiple latent distributions, with behavior data from diverse policies potentially following different distributions and data from the same policy across various time phases also exhibiting distribution variance. We introduce the Latent Distribution Representation Learning (LAD) framework, which aims to characterize the multiple latent distributions within offline data and reduce the distribution gaps between any pair of them. LAD consists of a min-max adversarial process: it first identifies the "worst-case" distributions to enlarge the diversity of distribution gaps and then reduces these gaps to learn invariant representations for generalization. We derive a generalization error bound to support LAD theoretically and verify its effectiveness through extensive experiments.

Published

2025-04-11

How to Cite

Wang, D., Li, L., Wei, W., Yu, Q., Hao, J., & Liang, J. (2025). Improving Generalization in Offline Reinforcement Learning via Latent Distribution Representation Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(20), 21053–21061. https://doi.org/10.1609/aaai.v39i20.35402

Issue

Section

AAAI Technical Track on Machine Learning VI