AdaFormer: Efficient Transformer with Adaptive Token Sparsification for Image Super-resolution

Authors

  • Xiaotong Luo Xiamen University
  • Zekun Ai Xiamen University
  • Qiuyuan Liang Xiamen University
  • Ding Liu Bytedance
  • Yuan Xie East China Normal University
  • Yanyun Qu Xiamen University
  • Yun Fu Northeastern University

DOI:

https://doi.org/10.1609/aaai.v38i5.28194

Keywords:

CV: Low Level & Physics-based Vision

Abstract

Efficient transformer-based models have made remarkable progress in image super-resolution (SR). Most of these works mainly design elaborate structures to accelerate the inference of the transformer, where all feature tokens are propagated equally. However, they ignore the underlying characteristic of image content, i.e., various image regions have distinct restoration difficulties, especially for large images (2K-8K), failing to achieve adaptive inference. In this work, we propose an adaptive token sparsification transformer (AdaFormer) to speed up the model inference for image SR. Specifically, a texture-relevant sparse attention block with parallel global and local branches is introduced, aiming to integrate informative tokens from the global view instead of only in fixed local windows. Then, an early-exit strategy is designed to progressively halt tokens according to the token importance. To estimate the plausibility of each token, we adopt a lightweight confidence estimator, which is constrained by an uncertainty-guided loss to obtain a binary halting mask about the tokens. Experiments on large images have illustrated that our proposal reduces nearly 90% latency against SwinIR on Test8K, while maintaining a comparable performance.

Downloads

Published

2024-03-24

How to Cite

Luo, X., Ai, Z., Liang, Q., Liu, D., Xie, Y., Qu, Y., & Fu, Y. (2024). AdaFormer: Efficient Transformer with Adaptive Token Sparsification for Image Super-resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4009-4016. https://doi.org/10.1609/aaai.v38i5.28194

Issue

Section

AAAI Technical Track on Computer Vision IV