FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing

Authors

  • Kaixiang Yang Huazhong University of Science and Technology
  • Boyang Shen Huazhong University of Science and Technology
  • Xin Li Huazhong University of Science and Technology
  • Yuchen Dai Huazhong University of Science and Technology
  • Yuxuan Luo Huazhong University of Science and Technology
  • Yueran Ma Huazhong University of Science and Technology
  • Wei Fang United Imaging Healthcare Co.
  • Qiang Li Huazhong University of Science and Technology
  • Zhiwei Wang Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i14.38145

Abstract

Text-guided image editing has advanced rapidly with the rise of diffusion models. While flow-based inversion-free methods offer high efficiency by avoiding latent inversion, they often fail to effectively integrate source information, leading to poor background preservation, spatial inconsistencies, and over-editing due to the lack of effective integration of source information. In this paper, we present FIA-Edit, a novel inversion-free framework that achieves high-fidelity and semantically precise edits through a Frequency-Interactive Attention. Specifically, we design two key components: (1) a Frequency Representation Interaction (FRI) module that enhances cross-domain alignment by exchanging frequency components between source and target features within self-attention, and (2) a Feature Injection (FIJ) module that explicitly incorporates source-side queries, keys, values, and text embeddings into the target branch's cross-attention to preserve structure and semantics. Comprehensive and extensive experiments demonstrate that FIA-Edit supports high-fidelity editing at low computational cost (~6s per 512 * 512 image on an RTX 4090) and consistently outperforms existing methods across diverse tasks in visual quality, background fidelity, and controllability. Furthermore, we are the first to extend text-guided image editing to clinical applications. By synthesizing anatomically coherent hemorrhage variations in surgical images, FIA-Edit opens new opportunities for medical data augmentation and delivers significant gains in downstream bleeding classification.

Published

2026-03-14

How to Cite

Yang, K., Shen, B., Li, X., Dai, Y., Luo, Y., Ma, Y., … Wang, Z. (2026). FIA-Edit: Frequency-Interactive Attention for Efficient and High-Fidelity Inversion-Free Text-Guided Image Editing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 11613–11621. https://doi.org/10.1609/aaai.v40i14.38145

Issue

Section

AAAI Technical Track on Computer Vision XI