SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

Authors

  • Hongjian Liu University of Science and Technology of China
  • Qingsong Xie OPPO AI Center
  • Tianxiang Ye Shanghai Jiaotong University
  • Zhijie Deng Shanghai Jiaotong University
  • Chen Chen OPPO AI Center
  • Shixiang Tang The Chinese University of Hong Kong
  • Xueyang Fu University of Science and Technology of China
  • Haonan Lu OPPO AI Center
  • Zheng-Jun Zha University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v39i5.32580

Abstract

The iterative sampling procedure employed by diffusion models (DMs) often leads to significant latency. To address this, we propose Stochastic Consistency Distillation (SCott) to enable accelerated text-to-image generation, where high-quality generations can be achieved with just 2-4 sampling steps or even1 step, and further improvements can be obtained by additional cost, e.g., 4 steps. In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pre-trained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher. SCott is augmented with elaborate strategies to control the noise strength and sampling process of the SDE solver. An adversarial loss is further incorporated to strengthen the sample quality with rare sampling steps. Empirically, on the MSCOCO-2017 5K dataset with a Stable Diffusion-V1.5 teacher, SCott achieves an FID of 21.9, surpassing that of the 1-step InstaFlow (23.4) and the 4-step UFOGen (22.1). Moreover, SCott can yield more diverse samples than other consistency models for high-resolution image generation, with up to 16% improvement in a qualified metric.

Downloads

Published

2025-04-11

How to Cite

Liu, H., Xie, Q., Ye, T., Deng, Z., Chen, C., Tang, S., … Zha, Z.-J. (2025). SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 5451–5459. https://doi.org/10.1609/aaai.v39i5.32580

Issue

Section

AAAI Technical Track on Computer Vision IV