Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models
DOI:
https://doi.org/10.1609/aaai.v38i5.28267Keywords:
CV: Computational Photography, Image & Video SynthesisAbstract
Denoising Diffusion Probabilistic Models (DDPMs) have achieved significant success in generation tasks. Nevertheless, the exposure bias issue, i.e., the natural discrepancy between the training (the output of each step is calculated individually by a given input) and inference (the output of each step is calculated based on the input iteratively obtained based on the model), harms the performance of DDPMs. To our knowledge, few works have tried to tackle this issue by modifying the training process for DDPMs, but they still perform unsatisfactorily due to 1) partially modeling the discrepancy and 2) ignoring the prediction error accumulation. To address the above issues, in this paper, we propose a multi-step denoising scheduled sampling (MDSS) strategy to alleviate the exposure bias for DDPMs. Analyzing the formulations of the training and inference of DDPMs, MDSS 1) comprehensively considers the discrepancy influence of prediction errors on the output of the model (the Gaussian noise) and the output of the step (the calculated input signal of the next step), and 2) efficiently models the prediction error accumulation by using multiple iterations of a mathematical formulation initialized from one-step prediction error obtained from the model. The experimental results, compared with previous works, demonstrate that our approach is more effective in mitigating exposure bias in DDPM, DDIM, and DPM-solver. In particular, MDSS achieves an FID score of 3.86 in 100 sample steps of DDIM on the CIFAR-10 dataset, whereas the second best obtains 4.78. The code will be available on GitHub.Downloads
Published
2024-03-24
How to Cite
Ren, Z., Zhan, Y., Ding, L., Wang, G., Wang, C., Fan, Z., & Tao, D. (2024). Multi-Step Denoising Scheduled Sampling: Towards Alleviating Exposure Bias for Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4667-4675. https://doi.org/10.1609/aaai.v38i5.28267
Issue
Section
AAAI Technical Track on Computer Vision IV