RealPortrait: Realistic Portrait Animation with Diffusion Transformers

Authors

  • Zejun Yang Tencent
  • Huawei Wei Tencent
  • Zhisheng Wang Tencent

DOI:

https://doi.org/10.1609/aaai.v39i9.33012

Abstract

We introduce RealPortrait, a framework based on Diffusion Transformers (DiT), designed to generate highly expressive and visually appealing portrait animations. Given a static portrait image, our method can transfer complex facial expressions and head pose movements extracted from a driving video onto the portrait, transforming it into a lifelike video. Specifically, we exploit the robust spatial-temporal modeling capabilities of DiT, enabling the generation of portrait videos that maintain high-fidelity visual details and ensure temporal coherence. In contrast to conventional image-to-video generation frameworks that necessitate a separate reference network, we incorporate an efficient reference attention within the DiT backbone, thereby obviating the computational overhead and achieving superior reference appearance preservation. Concurrently, we integrate a parallel ControlNet to precisely regulate intricate facial expressions and head poses. Diverging from prior methods that utilize explicit sparse motion representations, such as facial landmarks or 3DMM coefficients, we adopt a dense implicit motion representation as the control guidance. This implicit motion representation excels in capturing nuanced emotional facial expressions and subtle non-rigid dynamics of the lips. To further enhance the generalization capability of the model, we augment the training dataset by incorporating a substantial volume of facial image data through random crop augmentation. This strategy ensures the model's robustness across a wide variety of facial appearances and expressions. Empirical evaluations demonstrate that RealPortrait excels in generating portrait animations with highly-realistic quality and exceptional temporal coherence in appearance retention.

Downloads

Published

2025-04-11

How to Cite

Yang, Z., Wei, H., & Wang, Z. (2025). RealPortrait: Realistic Portrait Animation with Diffusion Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 39(9), 9345–9353. https://doi.org/10.1609/aaai.v39i9.33012

Issue

Section

AAAI Technical Track on Computer Vision VIII