X2Edit: Revisiting Arbitrary-Instruction Image Editing Through Self-Constructed Data and Task-Aware Representation Learning

Jian Ma; Xujie Zhu; Zihao Pan; Qirong Peng; Xu Guo; Chen Chen; Haonan Lu

doi:10.1609/aaai.v40i10.37719

Authors

Jian Ma OPPO AI Center
Xujie Zhu Sun Yat-sen University
Zihao Pan Sun Yat-sen University
Qirong Peng OPPO AI Center
Xu Guo Tsinghua University
Chen Chen OPPO AI Center
Haonan Lu OPPO AI Center

DOI:

https://doi.org/10.1609/aaai.v40i10.37719

Abstract

Existing open-source datasets for arbitrary-instruction image editing remain suboptimal, while a plug-and-play editing module compatible with community-prevalent generative models is notably absent. In this paper, we first introduce the X2Edit Dataset, a comprehensive dataset covering 14 diverse editing tasks, including subject-driven generation. We utilize the industry-leading unified image generation models and expert models to construct the data. Meanwhile, we design reasonable editing instructions with the VLM and implement various scoring mechanisms to filter the data. As a result, we construct 3.7 million high-quality data with balanced categories. Second, to better integrate seamlessly with community image generation models, we design task-aware MoE-LoRA training based on FLUX.1, with only 8% of the parameters of the full model. To further improve the final performance, we utilize the internal representations of the diffusion model and define positive/negative samples based on image editing types to introduce contrastive learning. Extensive experiments demonstrate that the model's editing performance is competitive among many excellent models. Additionally, the constructed dataset exhibits substantial advantages over existing open-source datasets.

X2Edit: Revisiting Arbitrary-Instruction Image Editing Through Self-Constructed Data and Task-Aware Representation Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information