Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models

Zhiyao Ren; Yibing Zhan; Liang Ding; Gaoang Wang; Chaoyue Wang; Zhongyi Fan; Dacheng Tao

doi:10.1609/aaai.v38i5.28267

Authors

Zhiyao Ren The University of Sydney
Yibing Zhan JD Explore Academy
Liang Ding JD Explore Academy
Gaoang Wang Zhejiang University
Chaoyue Wang The University of Sydney
Zhongyi Fan JD Explore Academy
Dacheng Tao The University of Sydney

DOI:

https://doi.org/10.1609/aaai.v38i5.28267

Keywords:

CV: Computational Photography, Image & Video Synthesis

Abstract

Denoising Diffusion Probabilistic Models (DDPMs) have achieved significant success in generation tasks. Nevertheless, the exposure bias issue, i.e., the natural discrepancy between the training (the output of each step is calculated individually by a given input) and inference (the output of each step is calculated based on the input iteratively obtained based on the model), harms the performance of DDPMs. To our knowledge, few works have tried to tackle this issue by modifying the training process for DDPMs, but they still perform unsatisfactorily due to 1) partially modeling the discrepancy and 2) ignoring the prediction error accumulation. To address the above issues, in this paper, we propose a multi-step denoising scheduled sampling (MDSS) strategy to alleviate the exposure bias for DDPMs. Analyzing the formulations of the training and inference of DDPMs, MDSS 1) comprehensively considers the discrepancy influence of prediction errors on the output of the model (the Gaussian noise) and the output of the step (the calculated input signal of the next step), and 2) efficiently models the prediction error accumulation by using multiple iterations of a mathematical formulation initialized from one-step prediction error obtained from the model. The experimental results, compared with previous works, demonstrate that our approach is more effective in mitigating exposure bias in DDPM, DDIM, and DPM-solver. In particular, MDSS achieves an FID score of 3.86 in 100 sample steps of DDIM on the CIFAR-10 dataset, whereas the second best obtains 4.78. The code will be available on GitHub.

Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription