CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space

Authors

  • Bingyi Liu School of Computer Science and Artificial Intelligence, Wuhan University of Technology Hubei Key Laboratory of Transportation Internet of Things (Wuhan University of Technology)
  • Jinbo He School of Computer Science and Artificial Intelligence, Wuhan University of Technology
  • Haiyong Shi School of Computer Science and Artificial Intelligence, Wuhan University of Technology
  • Enshu Wang School of Cyber Science and Engineering, Wuhan University
  • Weizhen Han School of Computer Science and Artificial Intelligence, Wuhan University of Technology
  • Jingxiang Hao School of Cyber Science and Engineering, Wuhan University
  • Peixi Wang School of Computer Science and Artificial Intelligence, Wuhan University of Technology
  • Zhuangzhuang Zhang Department of Computer Science, City University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i28.39537

Abstract

Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid discrete-continuous action space remains a fundamental challenge, mainly due to limited policy expressiveness and poor scalability in high-dimensional settings. To address this challenge, we view the hybrid action space problem as a fully cooperative game and propose a Cooperative Hybrid Diffusion Policies (CHDP) framework to solve it. CHDP employs two cooperative agents that leverage a discrete and a continuous diffusion policy, respectively. The continuous policy is conditioned on the discrete action's representation, explicitly modeling the dependency between them. This cooperative design allows the diffusion policies to leverage their expressiveness to capture complex distributions in their respective action spaces. To mitigate the update conflicts arising from simultaneous policy updates in this cooperative setting, we employ a sequential update scheme that fosters co-adaptation. Moreover, to improve scalability when learning in high-dimensional discrete action space, we construct a codebook that embeds the action space into a low-dimensional latent space. This mapping enables the discrete policy to learn in a compact, structured space. Finally, we design a Q-function-based guidance mechanism to align the codebook's embeddings with the discrete policy's representation during training. On challenging hybrid action benchmarks, CHDP outperforms state-of-the-art method by up to 19.3% in success rate.

Downloads

Published

2026-03-14

How to Cite

Liu, B., He, J., Shi, H., Wang, E., Han, W., Hao, J., … Zhang, Z. (2026). CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23640–23648. https://doi.org/10.1609/aaai.v40i28.39537

Issue

Section

AAAI Technical Track on Machine Learning V