MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Authors

  • Xu He Shenzhen International Graduate School, Tsinghua University
  • Zhiyong Wu Shenzhen International Graduate School, Tsinghua University The Chinese University of Hong Kong
  • Xiaoyu Li Tencent
  • Di Kang Tencent
  • Chaopeng Zhang Tencent
  • Jiangnan Ye Shenzhen International Graduate School, Tsinghua University
  • Liyang Chen Shenzhen International Graduate School, Tsinghua University
  • Xiangjun Gao The Hong Kong University of Science and Technology
  • Han Zhang Stanford University
  • Haolin Zhuang Shenzhen International Graduate School, Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v39i3.32356

Abstract

Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model to generate high-quality novel views from a single reference image. As its core, we leverage a pre-trained 2D diffusion model as the generative prior for generalizability, with the parametric SMPL-X model as the 3D body prior to promote 3D awareness. To maintain consistency while generating denser views for improved 3D human reconstruction, we introduce hybrid multi-view attention to facilitate efficient and thorough information interchange across views. Besides, we present a geometry-aware dual branch to perform concurrent generation in both RGB and normal domains, further enhancing consistency via geometry cues. Last but not least, to address ill-shaped issues arising from inaccurate SMPL-X estimation, we propose a novel iterative refinement strategy, which progressively optimizes SMPL-X accuracy while enhancing the quality and consistency of the generated multi-views. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in both novel view synthesis and subsequent 3D human reconstruction tasks.

Downloads

Published

2025-04-11

How to Cite

He, X., Wu, Z., Li, X., Kang, D., Zhang, C., Ye, J., … Zhuang, H. (2025). MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement. Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 3437–3445. https://doi.org/10.1609/aaai.v39i3.32356

Issue

Section

AAAI Technical Track on Computer Vision II