Iterative Token Evaluation and Refinement for Real-World Super-resolution

Authors

  • Chaofeng Chen S-Lab, Nanyang Technological University
  • Shangchen Zhou S-Lab, Nanyang Technological University
  • Liang Liao S-Lab, Nanyang Technological University
  • Haoning Wu S-Lab, Nanyang Technological University
  • Wenxiu Sun SenseTime Research
  • Qiong Yan SenseTime Research
  • Weisi Lin S-Lab, Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v38i2.27861

Keywords:

CV: Computational Photography, Image & Video Synthesis, CV: Low Level & Physics-based Vision

Abstract

Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations. Existing methods such as Generative Adversarial Networks (GANs) or continuous diffusion models present their own issues including GANs being difficult to train while continuous diffusion models requiring numerous inference steps. In this paper, we propose an Iterative Token Evaluation and Refinement (ITER) framework for RWSR, which utilizes a discrete diffusion model operating in the discrete token representation space, i.e., indexes of features extracted from a VQGAN codebook pre-trained with high-quality (HQ) images. We show that ITER is easier to train than GANs and more efficient than continuous diffusion models. Specifically, we divide RWSR into two sub-tasks, i.e., distortion removal and texture generation. Distortion removal involves simple HQ token prediction with LQ images, while texture generation uses a discrete diffusion model to iteratively refine the distortion removal output with a token refinement network. In particular, we propose to include a token evaluation network in the discrete diffusion process. It learns to evaluate which tokens are good restorations and helps to improve the iterative refinement results. Moreover, the evaluation network can first check status of the distortion removal output and then adaptively select total refinement steps needed, thereby maintaining a good balance between distortion removal and texture generation. Extensive experimental results show that ITER is easy to train and performs well within just 8 iterative steps.

Published

2024-03-24

How to Cite

Chen, C., Zhou, S., Liao, L., Wu, H., Sun, W., Yan, Q., & Lin, W. (2024). Iterative Token Evaluation and Refinement for Real-World Super-resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1010-1018. https://doi.org/10.1609/aaai.v38i2.27861

Issue

Section

AAAI Technical Track on Computer Vision I