IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter

Authors

  • Xiaojing Zhong South China University of Technology Nanyang Technological University
  • Zhonghua Wu SenseTime Research
  • Xiaofeng Yang Nanyang Technological University
  • Guosheng Lin Nanyang Technological University
  • Qingyao Wu South China University of Technology Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v39i10.33159

Abstract

Given a pair of images depicting a person and a garment separately, image-based 3D virtual try-on methods aim to reconstruct a 3D human model that realistically portrays the person wearing the desired garment. In this paper, we present IPVTON, a novel image-based 3D virtual try-on framework. IPVTON employs score distillation sampling with image prompts to optimize a hybrid 3D human representation, integrating target garment features into diffusion priors through an image prompt adapter. To avoid interference with non-target areas, we leverage mask-guided image prompt embeddings to focus the image features on the try-on regions. Moreover, we impose geometric constraints on the 3D model with a pseudo silhouette generated by ControlNet, ensuring that the clothed 3D human model retains the shape of the source identity while accurately wearing the target garments. Extensive qualitative and quantitative experiments demonstrate that IPVTON outperforms previous methods in image-based 3D virtual try-on tasks, excelling in both geometry and texture.

Downloads

Published

2025-04-11

How to Cite

Zhong, X., Wu, Z., Yang, X., Lin, G., & Wu, Q. (2025). IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter. Proceedings of the AAAI Conference on Artificial Intelligence, 39(10), 10671–10679. https://doi.org/10.1609/aaai.v39i10.33159

Issue

Section

AAAI Technical Track on Computer Vision IX