FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing

Kaixiang Yang; Boyang Shen; Xin Li; Yuchen Dai; Yuxuan Luo; Yueran Ma; Wei Fang; Qiang Li; Zhiwei Wang

doi:10.1609/aaai.v40i14.38145

Authors

Kaixiang Yang Huazhong University of Science and Technology
Boyang Shen Huazhong University of Science and Technology
Xin Li Huazhong University of Science and Technology
Yuchen Dai Huazhong University of Science and Technology
Yuxuan Luo Huazhong University of Science and Technology
Yueran Ma Huazhong University of Science and Technology
Wei Fang United Imaging Healthcare Co.
Qiang Li Huazhong University of Science and Technology
Zhiwei Wang Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i14.38145

Abstract

Text-guided image editing has advanced rapidly with the rise of diffusion models. While flow-based inversion-free methods offer high efficiency by avoiding latent inversion, they often fail to effectively integrate source information, leading to poor background preservation, spatial inconsistencies, and over-editing due to the lack of effective integration of source information. In this paper, we present FIA-Edit, a novel inversion-free framework that achieves high-fidelity and semantically precise edits through a Frequency-Interactive Attention. Specifically, we design two key components: (1) a Frequency Representation Interaction (FRI) module that enhances cross-domain alignment by exchanging frequency components between source and target features within self-attention, and (2) a Feature Injection (FIJ) module that explicitly incorporates source-side queries, keys, values, and text embeddings into the target branch's cross-attention to preserve structure and semantics. Comprehensive and extensive experiments demonstrate that FIA-Edit supports high-fidelity editing at low computational cost (~6s per 512 * 512 image on an RTX 4090) and consistently outperforms existing methods across diverse tasks in visual quality, background fidelity, and controllability. Furthermore, we are the first to extend text-guided image editing to clinical applications. By synthesizing anatomically coherent hemorrhage variations in surgical images, FIA-Edit opens new opportunities for medical data augmentation and delivers significant gains in downstream bleeding classification.

FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information