MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

Authors

  • Seyeon Kim Korea University Samsung Electronics
  • Siyoon Jin Korea University
  • Jihye Park Korea University Samsung Electronics
  • Kihong Kim VIVE STUDIOS
  • Jiyoung Kim Korea University
  • Jisu Nam Korea Advanced Institute of Science & Technology
  • Seungryong Kim Korea Advanced Institute of Science & Technology

DOI:

https://doi.org/10.1609/aaai.v39i4.32452

Abstract

Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models have attempted to address these limitations and improve fidelity. However, they still face challenges, such as intensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overcome these challenges, we propose a novel motion-disentangled diffusion model for high-quality talking head generation, called MoDiTalker. We introduce two modules: the Audio-To-Motion (AToM) module, designed to generate synchronized lip movements from audio, and the Motion-To-Video (MToV) module, designed to produce high-quality talking head videos based on the generated motions. AToM excels in capturing subtle lip movements by leveraging an audio attention mechanism. Additionally, MToV enhances temporal consistency by utilizing an efficient tri-plane representation. Our experiments on standard benchmarks demonstrate that our model outperforms existing GAN-based and diffusion-based models. We also provide comprehensive ablation studies and user study results.

Downloads

Published

2025-04-11

How to Cite

Kim, S., Jin, S., Park, J., Kim, K., Kim, J., Nam, J., & Kim, S. (2025). MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(4), 4302–4310. https://doi.org/10.1609/aaai.v39i4.32452

Issue

Section

AAAI Technical Track on Computer Vision III