JR2Net: Joint Monocular 3D Face Reconstruction and Reenactment


  • Jiaxiang Shang Hong Kong University of Science and Technology
  • Yu Zeng Johns Hopkins University
  • Xin Qiao Tencent
  • Xin Wang Tencent
  • Runze Zhang Tencent
  • Guangyuan Sun Tencent
  • Vishal Patel Johns Hopkins University
  • Hongbo Fu City University of Hong Kong




CV: Biometrics, Face, Gesture & Pose


Face reenactment and reconstruction benefit various applications in self-media, VR, etc. Recent face reenactment methods use 2D facial landmarks to implicitly retarget facial expressions and poses from driving videos to source images, while they suffer from pose and expression preservation issues for cross-identity scenarios, i.e., when the source and the driving subjects are different. Current self-supervised face reconstruction methods also demonstrate impressive results. However, these methods do not handle large expressions well, since their training data lacks samples of large expressions, and 2D facial attributes are inaccurate on such samples. To mitigate the above problems, we propose to explore the inner connection between the two tasks, i.e., using face reconstruction to provide sufficient 3D information for reenactment, and synthesizing videos paired with captured face model parameters through face reenactment to enhance the expression module of face reconstruction. In particular, we propose a novel cascade framework named JR2Net for Joint Face Reconstruction and Reenactment, which begins with the training of a coarse reconstruction network, followed by a 3D-aware face reenactment network based on the coarse reconstruction results. In the end, we train an expression tracking network based on our synthesized videos composed by image-face model parameter pairs. Such an expression tracking network can further enhance the coarse face reconstruction. Extensive experiments show that our JR2Net outperforms the state-of-the-art methods on several face reconstruction and reenactment benchmarks.




How to Cite

Shang, J., Zeng, Y., Qiao, X., Wang, X., Zhang, R., Sun, G., Patel, V., & Fu, H. (2023). JR2Net: Joint Monocular 3D Face Reconstruction and Reenactment. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2200-2208. https://doi.org/10.1609/aaai.v37i2.25314



AAAI Technical Track on Computer Vision II