Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Xiang Gao; Zhengbo Xu; Junhan Zhao; Jiaying Liu

doi:10.1609/aaai.v38i3.27951

Authors

Xiang Gao Peking University
Zhengbo Xu Peking University
Junhan Zhao Peking University
Jiaying Liu Peking University

DOI:

https://doi.org/10.1609/aaai.v38i3.27951

Keywords:

CV: Computational Photography, Image & Video Synthesis, CV: Multi-modal Vision

Abstract

Recently, text-to-image diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing flexible image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion model (FCDiffusion), an end-to-end diffusion-based framework contributing a novel solution to text-guided I2I from a frequency-domain perspective. At the heart of our framework is a feature-space frequency-domain filtering module based on Discrete Cosine Transform, which extracts image features carrying different DCT spectral bands to control the text-to-image generation process of the Latent Diffusion Model, realizing versatile I2I applications including style-guided content creation, image semantic manipulation, image scene translation, and image style translation. Different from related methods, FCDiffusion establishes a unified text-driven I2I framework suiting diverse I2I application scenarios simply by switching among different frequency control branches. The effectiveness and superiority of our method for text-guided I2I are demonstrated with extensive experiments both qualitatively and quantitatively. Our project is publicly available at: https://xianggao1102.github.io/FCDiffusion/.

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription