Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation
DOI:
https://doi.org/10.1609/aaai.v38i3.27951Keywords:
CV: Computational Photography, Image & Video Synthesis, CV: Multi-modal VisionAbstract
Recently, text-to-image diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing flexible image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion model (FCDiffusion), an end-to-end diffusion-based framework contributing a novel solution to text-guided I2I from a frequency-domain perspective. At the heart of our framework is a feature-space frequency-domain filtering module based on Discrete Cosine Transform, which extracts image features carrying different DCT spectral bands to control the text-to-image generation process of the Latent Diffusion Model, realizing versatile I2I applications including style-guided content creation, image semantic manipulation, image scene translation, and image style translation. Different from related methods, FCDiffusion establishes a unified text-driven I2I framework suiting diverse I2I application scenarios simply by switching among different frequency control branches. The effectiveness and superiority of our method for text-guided I2I are demonstrated with extensive experiments both qualitatively and quantitatively. Our project is publicly available at: https://xianggao1102.github.io/FCDiffusion/.Downloads
Published
2024-03-24
How to Cite
Gao, X., Xu, Z., Zhao, J., & Liu, J. (2024). Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 1824-1832. https://doi.org/10.1609/aaai.v38i3.27951
Issue
Section
AAAI Technical Track on Computer Vision II