IMAGDressing-v1: Customizable Virtual Dressing
DOI:
https://doi.org/10.1609/aaai.v39i7.32729Abstract
Existing virtual try-on (VTON) methods provide only limited user control over garment attributes and generally overlook essential factors such as face, pose, and scene context. To address these limitations, we introduce the virtual dressing (VD) task, which aims to synthesize freely editable human images conditioned on fixed garments and optional user-defined inputs. We further propose a comprehensive affinity metric index (CAMI) to quantify the consistency between generated outputs and reference garments. We present IMAGDressing-v1, which leverages a garment-specific U-Net to integrate semantic features from CLIP and texture features from a VAE. To incorporate these garment features into a frozen denoising U-Net for flexible text-driven scene control, we employ a hybrid attention mechanism composed of frozen self-attention and trainable cross-attention layers. IMAGDressing-v1 seamlessly integrates with extension modules, such as ControlNet and IP-Adapter, enabling enhanced diversity and controllability. To alleviate data constraints, we introduce the Interactive Garment Pairing (IGPair) dataset, comprising over 300,000 garment–image pairs and a standardized data assembly pipeline. Extensive experiments demonstrate that IMAGDressing-v1 achieves state-of-the-art performance in controlled human image synthesis. The code and model will be available at https://github.com/muzishen/IMAGDressing.Downloads
Published
2025-04-11
How to Cite
Shen, F., Jiang, X., He, X., Ye, H., Wang, C., Du, X., … Tang, J. (2025). IMAGDressing-v1: Customizable Virtual Dressing. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 6795–6804. https://doi.org/10.1609/aaai.v39i7.32729
Issue
Section
AAAI Technical Track on Computer Vision VI