Enhancing Diffusion Policies with Distribution-Matching Generator in Offline Reinforcement Learning

Xuemin Hu; Shen Li; Yingfen Xu; Bo Tang; Long Chen

doi:10.1609/aaai.v40i26.39342

Authors

Xuemin Hu Hubei University Key Laboratory of Intelligent Sensing System and Security (Hubei University), Ministry of Education
Shen Li Hubei University Tongji University
Yingfen Xu Hubei University
Bo Tang Worcester Polytechnic Institute
Long Chen Institute of automation, Chinese academy of science, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i26.39342

Abstract

Offline reinforcement learning (RL) can learn policies from pre-collected offline datasets without interacting with the environment, but it suffers from the issue of out-of-distribution (OOD). Recent methods use the generative adversarial paradigm to learn policies, but easily fail to handle the conflict of fooling the discriminator and maximizing expected returns. In this paper, we propose a novel offline RL method named Distribution-Matching Generator-based Diffusion Policies (DMGDP). A distribution matching-based policy learning method is first developed, where the diffusion serves as the policy generator, to handle the conflict of fooling the discriminator and maximizing expected returns. Furthermore, a policy confidence mechanism based on discriminator regularization is designed to prevent the agent from taking OOD actions, with the aim of robust generative adversarial learning. We conducted extensive experiments on the D4RL benchmarks, and the results demonstrate that DMGDP outperforms state-of-the-art methods.

Enhancing Diffusion Policies with Distribution-Matching Generator in Offline Reinforcement Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information