FG-EmoTalk: Talking Head Video Generation with Fine-Grained Controllable Facial Expressions

Authors

  • Zhaoxu Sun Xiaobing.ai
  • Yuze Xuan Xiaobing.ai
  • Fang Liu State Key Laboratory of Media Convergence and Communication, Communication University of China
  • Yang Xiang Xiaobing.ai

DOI:

https://doi.org/10.1609/aaai.v38i5.28309

Keywords:

CV: Computational Photography, Image & Video Synthesis

Abstract

Although deep generative models have greatly improved one-shot video-driven talking head generation, few studies address fine-grained controllable facial expression editing, which is crucial for practical applications. Existing methods rely on a fixed set of predefined discrete emotion labels or simply copy expressions from input videos. This is limiting as expressions are complex, and methods using only emotion labels cannot generate fine-grained, accurate or mixed expressions. Generating talking head video with precise expressions is also difficult using 3D model-based approaches, as 3DMM only models facial movements and tends to produce deviations. In this paper, we propose a novel framework enabling fine-grained facial expression editing in talking face generation. Our goal is to achieve expression control by manipulating the intensities of individual facial Action Units (AUs) or groups. First, compared with existing methods which decouple the face into pose and expression, we propose a disentanglement scheme to isolates three components from the human face, namely, appearance, pose, and expression. Second, we propose to use input AUs to control muscle group intensities in the generated face, and integrate the AUs features with the disentangled expression latent code. Finally, we present a self-supervised training strategy with well-designed constraints. Experiments show our method achieves fine-grained expression control, produces high-quality talking head videos and outperforms baseline methods.

Published

2024-03-24

How to Cite

Sun, Z., Xuan, Y., Liu, F., & Xiang, Y. (2024). FG-EmoTalk: Talking Head Video Generation with Fine-Grained Controllable Facial Expressions. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 5043-5051. https://doi.org/10.1609/aaai.v38i5.28309

Issue

Section

AAAI Technical Track on Computer Vision IV