Enhancing Diffusion Policies with Distribution-Matching Generator in Offline Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v40i26.39342Abstract
Offline reinforcement learning (RL) can learn policies from pre-collected offline datasets without interacting with the environment, but it suffers from the issue of out-of-distribution (OOD). Recent methods use the generative adversarial paradigm to learn policies, but easily fail to handle the conflict of fooling the discriminator and maximizing expected returns. In this paper, we propose a novel offline RL method named Distribution-Matching Generator-based Diffusion Policies (DMGDP). A distribution matching-based policy learning method is first developed, where the diffusion serves as the policy generator, to handle the conflict of fooling the discriminator and maximizing expected returns. Furthermore, a policy confidence mechanism based on discriminator regularization is designed to prevent the agent from taking OOD actions, with the aim of robust generative adversarial learning. We conducted extensive experiments on the D4RL benchmarks, and the results demonstrate that DMGDP outperforms state-of-the-art methods.Published
2026-03-14
How to Cite
Hu, X., Li, S., Xu, Y., Tang, B., & Chen, L. (2026). Enhancing Diffusion Policies with Distribution-Matching Generator in Offline Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21894–21902. https://doi.org/10.1609/aaai.v40i26.39342
Issue
Section
AAAI Technical Track on Machine Learning III