StyleFM: Frequency Manipulation Empowered by Recursive Attention on Diffusion Models for Arbitrary Style Transfer
DOI:
https://doi.org/10.1609/aaai.v40i10.37730Abstract
Given the remarkable performance of diffusion models in image generation, recent research has been exploring their adaptation to style transfer. However, current diffusion-based approaches encounter persistent challenges, such as style distortions and the reliance on textual prompts for content preservation. To address these limitations, we introduce StyleFM, a novel training-free diffusion-based style transfer approach that incorporates optimization strategies into both the frequency and temporal domains. The proposed method provides two core innovations: (1) Tripartite Frequency Manipulation: To more precisely tailor frequency manipulation, StyleFM introduces a tripartite frequency design with a buffer band accounting for the overlap of content and style representations. In addition, StyleFM designs a frequency superposition editing method to achieve frequency enhancement. (2) Recursive Attention: StyleFM proposes the recursive attention strategy within the diffusion process, which facilitates the progressive and consistent injection of style information throughout the temporal process without reliance on text guidance. Experiments demonstrate that StyleFM outperforms state-of-the-art methods. It effectively preserves content fidelity while achieving sufficient style embedding.Published
2026-03-14
How to Cite
Ma, Y., Liu, Z., Liu, S., & Basu, A. (2026). StyleFM: Frequency Manipulation Empowered by Recursive Attention on Diffusion Models for Arbitrary Style Transfer. Proceedings of the AAAI Conference on Artificial Intelligence, 40(10), 7865-7873. https://doi.org/10.1609/aaai.v40i10.37730
Issue
Section
AAAI Technical Track on Computer Vision VII