DialoGen: Towards Dialog Gesture Generation via Identity-Decoupled Style Guidance in Interactive Diffusion Model

Weiyu Zhao; Chenyang Wang; Liangxiao Hu; Zonglin Li; Wei Yu; Shengping Zhang

doi:10.1609/aaai.v40i16.38327

Authors

Weiyu Zhao Harbin Institute of Technology
Chenyang Wang Harbin Institute of Technology
Liangxiao Hu Harbin Institute of Technology
Zonglin Li Harbin Institute of Technology
Wei Yu Tsinghua University
Shengping Zhang Harbin Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i16.38327

Abstract

We propose DialoGen, a novel framework for generating realistic gestures for both interlocutors in dialog scenarios, conditioned on conversational audios. Unlike most existing methods that focus solely on a single speaker, DialoGen simultaneously generates synchronized gestures for both participants while also embedding identity-decoupled style into generated gestures that enhance realism and expressiveness. To ensure precise synchronization between interlocutors, DialoGen adopts an interactive dual-diffusion model with mutual interaction estimation, which integrates interaction correlation into the diffusion process. More importantly, by leveraging supervised contrastive learning, we develop the identity-decoupled style guidance to adaptively decompose the identity-specific style of interlocutors into latent space, enabling multi-style dialog gesture generation. Extensive experimental results demonstrate that our model significantly outperforms existing methods in generating realistic, speech-aligned, identity-specific gestures, offering a high-quality solution for various dialog scenarios.

DialoGen: Towards Dialog Gesture Generation via Identity-Decoupled Style Guidance in Interactive Diffusion Model

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information