Simplifying Control Mechanism in Text-to-Image Diffusion Models

Zhida Feng; Li Chen; Yuenan Sun; Jiaxiang Liu; Shikun Feng

doi:10.1609/aaai.v39i3.32309

Authors

Zhida Feng School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan, China. Baidu Inc.
Li Chen School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan, China.
Yuenan Sun School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan, China.
Jiaxiang Liu Baidu Inc.
Shikun Feng Baidu Inc.

DOI:

https://doi.org/10.1609/aaai.v39i3.32309

Abstract

ControlNet has significantly advanced controllable image generation by integrating dense conditions (such as depth and canny edges) with text-to-image diffusion models. However, ControlNet's integration requires an additional amount nearly equal to half of the base diffusion model's parameters, making it inefficient. To address this, we introduce Simple-ControlNet, an efficient and streamlined network for controllable text-to-image generation. It employs a single-scale projection layer to incorporate condition information into the denoising U-Net. It is supplemented by Low-Rank Adapter (LoRA) parameters to facilitate condition learning. Impressively, Simple-ControlNet requires fewer than 3 million parameters for the control mechanism, substantially less than the 300 million needed by ControlNet. Our extensive experiments confirm that Simple-ControlNet matches and surpasses ControlNet's performance across a broad range of tasks and base diffusion models, showcasing its utility and efficiency.

Simplifying Control Mechanism in Text-to-Image Diffusion Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information