DialoGen: Towards Dialog Gesture Generation via Identity-Decoupled Style Guidance in Interactive Diffusion Model

Authors

  • Weiyu Zhao Harbin Institute of Technology
  • Chenyang Wang Harbin Institute of Technology
  • Liangxiao Hu Harbin Institute of Technology
  • Zonglin Li Harbin Institute of Technology
  • Wei Yu Tsinghua University
  • Shengping Zhang Harbin Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i16.38327

Abstract

We propose DialoGen, a novel framework for generating realistic gestures for both interlocutors in dialog scenarios, conditioned on conversational audios. Unlike most existing methods that focus solely on a single speaker, DialoGen simultaneously generates synchronized gestures for both participants while also embedding identity-decoupled style into generated gestures that enhance realism and expressiveness. To ensure precise synchronization between interlocutors, DialoGen adopts an interactive dual-diffusion model with mutual interaction estimation, which integrates interaction correlation into the diffusion process. More importantly, by leveraging supervised contrastive learning, we develop the identity-decoupled style guidance to adaptively decompose the identity-specific style of interlocutors into latent space, enabling multi-style dialog gesture generation. Extensive experimental results demonstrate that our model significantly outperforms existing methods in generating realistic, speech-aligned, identity-specific gestures, offering a high-quality solution for various dialog scenarios.

Downloads

Published

2026-03-14

How to Cite

Zhao, W., Wang, C., Hu, L., Li, Z., Yu, W., & Zhang, S. (2026). DialoGen: Towards Dialog Gesture Generation via Identity-Decoupled Style Guidance in Interactive Diffusion Model. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13253–13261. https://doi.org/10.1609/aaai.v40i16.38327

Issue

Section

AAAI Technical Track on Computer Vision XIII