Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models

Authors

  • Guanqi Ding University of Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
  • Chengyu Yang Beijing Institute of Technology
  • Shuhui Wang Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences Peng Cheng Laboratory
  • Xincheng Li Huawei Technologies Ltd.
  • Jinzhe Zhang Huawei Technologies Ltd.
  • Xin Jin Huawei Technologies Ltd.
  • Qingming Huang University of Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v39i3.32279

Abstract

Personalized image generation enables customized content creation based on the text-to-image diffusion models.However, existing personalization methods focus on fine-tuning generative models to learn to generate specific single individuals or concepts, such as an image of a specific Corgi, but are unable to generate data for multiple individuals or concepts with common characteristics, such as images of multiple different Corgis. In this work, we focus on personalizing a diffusion model to generated varied data usually containing multiple subjects, which has a more diverse and complex data distribution. Our basic assumption is that the varied data distribution is composed of the common features shared among all samples, as well as the reasonable variations within it. Accordingly, we are capable to decompose the learning process of complex data distributions into two simpler sub-tasks, employing a divide-and-conquer approach. To this end we propose Dis2Booth, a framework that can learn complex image Distribution by Disentangling data distribution in an unsupervised manner.Specifically, Dis2Booth contains two modules, Anchor LoRA and Delta LoRA, that are tasked with learning the common features and variational features constrained by Contextual Loss and Delta Loss unsupervisedly. Besides, the Asynchronous Optimization Strategy is proposed to ensure the collaborative training of the two modules. Extensive experiments suggest that Dis2Booth is able to learn the data distribution with higher diversity and complexity while maintaining the same level of flexibility as LoRA.

Downloads

Published

2025-04-11

How to Cite

Ding, G., Yang, C., Wang, S., Li, X., Zhang, J., Jin, X., & Huang, Q. (2025). Dis²Booth: Learning Image Distribution with Disentangled Features for Text-to-Image Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 2744–2752. https://doi.org/10.1609/aaai.v39i3.32279

Issue

Section

AAAI Technical Track on Computer Vision II