ReMask-Animate: Refined Character Image Animation Using Mask-Guided Adapters

Authors

  • Xunzhi Xiang Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 01AI, Beijing, China
  • Haiwei Xue Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 01AI, Beijing, China Tsinghua University, Shenzhen, Guangdong, China
  • Zonghong Dai 01AI, Beijing, China
  • Di Wang Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China
  • Minglei Li 01AI, Beijing, China
  • Ye Yue Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China
  • Fei Ma Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China
  • Weijiang Yu Sun Yat-sen University, Guangzhou, Guangdong, China
  • Heng Chang Tsinghua University, Shenzhen, Guangdong, China
  • Fei Richard Yu Shenzhen University, Shenzhen, Guangdong, China Carleton University, Canada

DOI:

https://doi.org/10.1609/aaai.v39i8.32932

Abstract

Pose-controlled human video generation is of significant interest and finds extensive applications in areas such as automated advertising and content creation on social media platforms. While existing methods employing pose sequences and reference images for human image animation have exhibited notable performance, they tend to encounter issues such as specific region blurring, background sharpening, and decreased identity consistency. In this paper, we introduce ReMask-Animate, which utilizes masks as additional priors to guide the model's local visual attention to specific areas, thereby alleviating feature confusion between different regions of the image. Three distinct mask-guided adapters are designed for cross-condition regional fusion of hand and face pose features, mitigating feature confusion between the foreground and background, and enhancing the visual consistency of character identity. Moreover, these lightweight adapters introduce minimal computational overhead and can be seamlessly integrated into specific layers of the backbone architecture. Extensive experiments show that our method outperforms state-of-the-art methods on five metrics in public datasets. Additionally, qualitative evaluations highlight a significant improvement in the quality of generated videos, demonstrating our approach's superiority.

Downloads

Published

2025-04-11

How to Cite

Xiang, X., Xue, H., Dai, Z., Wang, D., Li, M., Yue, Y., … Yu, F. R. (2025). ReMask-Animate: Refined Character Image Animation Using Mask-Guided Adapters. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 8628–8636. https://doi.org/10.1609/aaai.v39i8.32932

Issue

Section

AAAI Technical Track on Computer Vision VII