Simplifying Control Mechanism in Text-to-Image Diffusion Models

Authors

  • Zhida Feng School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan, China. Baidu Inc.
  • Li Chen School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan, China.
  • Yuenan Sun School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China. Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan, China.
  • Jiaxiang Liu Baidu Inc.
  • Shikun Feng Baidu Inc.

DOI:

https://doi.org/10.1609/aaai.v39i3.32309

Abstract

ControlNet has significantly advanced controllable image generation by integrating dense conditions (such as depth and canny edges) with text-to-image diffusion models. However, ControlNet's integration requires an additional amount nearly equal to half of the base diffusion model's parameters, making it inefficient. To address this, we introduce Simple-ControlNet, an efficient and streamlined network for controllable text-to-image generation. It employs a single-scale projection layer to incorporate condition information into the denoising U-Net. It is supplemented by Low-Rank Adapter (LoRA) parameters to facilitate condition learning. Impressively, Simple-ControlNet requires fewer than 3 million parameters for the control mechanism, substantially less than the 300 million needed by ControlNet. Our extensive experiments confirm that Simple-ControlNet matches and surpasses ControlNet's performance across a broad range of tasks and base diffusion models, showcasing its utility and efficiency.

Downloads

Published

2025-04-11

How to Cite

Feng, Z., Chen, L., Sun, Y., Liu, J., & Feng, S. (2025). Simplifying Control Mechanism in Text-to-Image Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 3013–3021. https://doi.org/10.1609/aaai.v39i3.32309

Issue

Section

AAAI Technical Track on Computer Vision II