A Geometric Perspective on Optimizing Vector Quantized Latent Diffusion Model for Image Restoration

Authors

  • Chen Hang East China Normal University
  • Haoming Chen East China Normal University
  • Xuwei Fang Bestpay AI Lab
  • Weisheng Xie Bestpay AI Lab
  • Xiangxiang Gao Bestpay AI Lab
  • Faming Fang East China Normal University
  • Guixu Zhang East China Normal University
  • Haichuan Song East China Normal University

DOI:

https://doi.org/10.1609/aaai.v40i6.42462

Abstract

In this paper, we investigate the limitations of the Vector Quantized Latent Diffusion Model (VQ-LDM) in restoration tasks. We identify a performance gap between the Vector Quantization (VQ) and Diffusion Model components, manifested as a significant discrepancy between the reconstruction quality of ground truth images processed via VQ autoregression and degraded images restored by VQ-LDM. Through experiments, we attribute this gap primarily to the lack of robustness in the mapped points of VQ within the original VQ-LDM framework. To address this issue, we propose a geometric based optimization approach. First, we introduce a simple yet effective method, termed interpolation-based latent initial state optimization, which mitigates the performance gap by replacing the original mapped points with interpolated values, supported by theoretical analysis. Here, the latent initial state refers specifically to the input of the diffusion model. Building upon this, we further propose a Chebyshev center-based latent initial state optimization, an elegant theoretical solution from a geometric perspective, that further enhances restoration performance. Our improvements consistently achieve superior results across nine benchmark datasets.

Downloads

Published

2026-03-14

How to Cite

Hang, C., Chen, H., Fang, X., Xie, W., Gao, X., Fang, F., … Song, H. (2026). A Geometric Perspective on Optimizing Vector Quantized Latent Diffusion Model for Image Restoration. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4619–4626. https://doi.org/10.1609/aaai.v40i6.42462

Issue

Section

AAAI Technical Track on Computer Vision III