State Deviation Correction for Offline Reinforcement Learning

Hongchang Zhang; Jianzhun Shao; Yuhang Jiang; Shuncheng He; Guanwen Zhang; Xiangyang Ji

doi:10.1609/aaai.v36i8.20886

Authors

Hongchang Zhang Tsinghua University
Jianzhun Shao Tsinghua University
Yuhang Jiang Tsinghua University
Shuncheng He Tsinghua University
Guanwen Zhang Northwestern Polytechnical University
Xiangyang Ji Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v36i8.20886

Keywords:

Machine Learning (ML)

Abstract

Offline reinforcement learning aims to maximize the expected cumulative rewards with a fixed collection of data. The basic principle of current offline reinforcement learning methods is to restrict the policy to the offline dataset action space. However, they ignore the case where the dataset's trajectories fail to cover the state space completely. Especially, when the dataset's size is limited, it is likely that the agent would encounter unseen states during test time. Prior policy-constrained methods are incapable of correcting the state deviation, and may lead the agent to its unexpected regions further. In this paper, we propose the state deviation correction (SDC) method to constrain the policy's induced state distribution by penalizing the out-of-distribution states which might appear during the test period. We first perturb the states sampled from the logged dataset, then simulate noisy next states on the basis of a dynamics model and the policy. We then train the policy to minimize the distances between the noisy next states and the offline dataset. In this manner, we allow the trained policy to guide the agent to its familiar regions. Experimental results demonstrate that our proposed method is competitive with the state-of-the-art methods in a GridWorld setup, offline Mujoco control suite, and a modified offline Mujoco dataset with a finite number of valuable samples.

State Deviation Correction for Offline Reinforcement Learning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information