Improving Generalization in Offline Reinforcement Learning via Latent Distribution Representation Learning

Da Wang; Lin Li; Wei Wei; Qixian Yu; Jianye Hao; Jiye Liang

doi:10.1609/aaai.v39i20.35402

Authors

Da Wang Shanxi University
Lin Li Shanxi University
Wei Wei Shanxi University
Qixian Yu Shanxi University
Jianye Hao Tianjin University
Jiye Liang Shanxi University

DOI:

https://doi.org/10.1609/aaai.v39i20.35402

Abstract

Dealing with the distribution shift is a significant challenge when building offline reinforcement learning (RL) models that can generalize from a static dataset to out-of-distribution (OOD) scenarios. Previous approaches have employed pessimism or conservatism strategies. More recently, data-driven work has taken a distributional perspective, treating offline data as a domain adaptation problem. However, these methods use heuristic techniques to simulate distribution shifts, resulting in a limited diversity of artificially created distribution gaps. In this paper, we propose a novel perspective: offline datasets inherently contain multiple latent distributions, with behavior data from diverse policies potentially following different distributions and data from the same policy across various time phases also exhibiting distribution variance. We introduce the Latent Distribution Representation Learning (LAD) framework, which aims to characterize the multiple latent distributions within offline data and reduce the distribution gaps between any pair of them. LAD consists of a min-max adversarial process: it first identifies the "worst-case" distributions to enlarge the diversity of distribution gaps and then reduces these gaps to learn invariant representations for generalization. We derive a generalization error bound to support LAD theoretically and verify its effectiveness through extensive experiments.

Improving Generalization in Offline Reinforcement Learning via Latent Distribution Representation Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information