Object Fusion via Diffusion Time-step for Customized Image Editing with Single Example

Xue Song; Zhongqi Yue; Jiequan Cui; Hanwang Zhang; Jingjing Chen

doi:10.1609/aaai.v40i11.37869

Authors

Xue Song College of Computer Science and Artificial Intelligence, Fudan University
Zhongqi Yue College of Computing and Data Science, Nanyang Technological University
Jiequan Cui School of Computer Science and Information Engineering, Hefei University of Technology
Hanwang Zhang College of Computing and Data Science, Nanyang Technological University
Jingjing Chen College of Computer Science and Artificial Intelligence, Fudan University Institute of Trustworthy Embodied AI, Fudan University

DOI:

https://doi.org/10.1609/aaai.v40i11.37869

Abstract

We tackle the task of customized image editing using a text-conditioned Diffusion Model (DM). The goal is to fuse the subject in a reference image (e.g., sunglasses) with a source one (e.g., a boy), while retaining the fidelity of them both (e.g., the boy wearing the sunglasses). An intuitive approach, called LoRA fusion, first separately trains a DM LoRA for each image to encode its details. Then the two LoRAs are linearly combined by a weight to generate a fused image. Unfortunately, even through careful grid search or learning the weight, this approach still trades off the fidelity of one image against the other. We point out that the evil lies in the overlooked role of diffusion time-step in the generation process, i.e., a smaller time-step controls the generation of a more fine-grained attribute. For example, a large LoRA weight for the source may help preserve its fine-grained details (e.g., face attributes) at a small time-step, but could overpower the reference subject LoRA and lose the fidelity of its overall shape at a larger time-step. To address this deficiency, we propose TimeFusion, which learns a time-step-specific LoRA fusion weight that resolves the trade-off, i.e., generating the source and reference subject in high fidelity given their respective prompt. Then we can customize image editing using this weight and a target prompt.

Object Fusion via Diffusion Time-step for Customized Image Editing with Single Example

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information