Enhancing Diffusion Policies with Distribution-Matching Generator in Offline Reinforcement Learning

Authors

  • Xuemin Hu Hubei University Key Laboratory of Intelligent Sensing System and Security (Hubei University), Ministry of Education
  • Shen Li Hubei University Tongji University
  • Yingfen Xu Hubei University
  • Bo Tang Worcester Polytechnic Institute
  • Long Chen Institute of automation, Chinese academy of science, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i26.39342

Abstract

Offline reinforcement learning (RL) can learn policies from pre-collected offline datasets without interacting with the environment, but it suffers from the issue of out-of-distribution (OOD). Recent methods use the generative adversarial paradigm to learn policies, but easily fail to handle the conflict of fooling the discriminator and maximizing expected returns. In this paper, we propose a novel offline RL method named Distribution-Matching Generator-based Diffusion Policies (DMGDP). A distribution matching-based policy learning method is first developed, where the diffusion serves as the policy generator, to handle the conflict of fooling the discriminator and maximizing expected returns. Furthermore, a policy confidence mechanism based on discriminator regularization is designed to prevent the agent from taking OOD actions, with the aim of robust generative adversarial learning. We conducted extensive experiments on the D4RL benchmarks, and the results demonstrate that DMGDP outperforms state-of-the-art methods.

Downloads

Published

2026-03-14

How to Cite

Hu, X., Li, S., Xu, Y., Tang, B., & Chen, L. (2026). Enhancing Diffusion Policies with Distribution-Matching Generator in Offline Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21894–21902. https://doi.org/10.1609/aaai.v40i26.39342

Issue

Section

AAAI Technical Track on Machine Learning III