Variational OOD State Correction for Offline Reinforcement Learning

Authors

  • Ke Jiang Nanjing University of Aeronautics and Astronautics
  • Wen Jiang Nanjing University of Aeronautics and Astronautics
  • Xiaoyang Tan Nanjing University of Aeronautics and Astronautics

DOI:

https://doi.org/10.1609/aaai.v40i27.39390

Abstract

The performance of Offline reinforcement learning is significantly impacted by the issue of state distributional shift, and out-of-distribution (OOD) state correction is a popular approach to address this problem. However, previous methods correct the agent's transition distributions in a supervised way, which significantly degrades the flexibility and robustness. In this paper, we propose a novel method named Density-Aware Safety Perception (DASP) for OOD state correction. Specifically, our method encourages the agent to prioritize actions that lead to outcomes with higher data density, thereby promoting its operation within or the return to in-distribution (safe) regions. To achieve this, we optimize the objective within a variational framework that concurrently considers both the potential outcomes of decision-making and their density, thus providing crucial contextual information for safe decision-making. Finally, we validate the effectiveness and feasibility of our proposed method through extensive experimental evaluations on the offline MuJoCo and AntMaze suites.

Downloads

Published

2026-03-14

How to Cite

Jiang, K., Jiang, W., & Tan, X. (2026). Variational OOD State Correction for Offline Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(27), 22327–22335. https://doi.org/10.1609/aaai.v40i27.39390

Issue

Section

AAAI Technical Track on Machine Learning IV