Transformer-Based Selective Super-resolution for Efficient Image Refinement

Authors

  • Tianyi Zhang University of Minnesota
  • Kishore Kasichainula Arizona State University
  • Yaoxin Zhuo Arizona State University
  • Baoxin Li Arizona State University
  • Jae-Sun Seo Cornell Tech
  • Yu Cao University of Minnesota

DOI:

https://doi.org/10.1609/aaai.v38i7.28560

Keywords:

CV: Other Foundations of Computer Vision, CV: Representation Learning for Vision, CV: Computational Photography, Image & Video Synthesis, ML: Deep Generative Models & Autoencoders

Abstract

Conventional super-resolution methods suffer from two drawbacks: substantial computational cost in upscaling an entire large image, and the introduction of extraneous or potentially detrimental information for downstream computer vision tasks during the refinement of the background. To solve these issues, we propose a novel transformer-based algorithm, Selective Super-Resolution (SSR), which partitions images into non-overlapping tiles, selects tiles of interest at various scales with a pyramid architecture, and exclusively reconstructs these selected tiles with deep features. Experimental results on three datasets demonstrate the efficiency and robust performance of our approach for super-resolution. Compared to the state-of-the-art methods, the FID score is reduced from 26.78 to 10.41 with 40% reduction in computation cost for the BDD100K dataset.

Published

2024-03-24

How to Cite

Zhang, T., Kasichainula, K., Zhuo, Y., Li, B., Seo, J.-S., & Cao, Y. (2024). Transformer-Based Selective Super-resolution for Efficient Image Refinement. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7305–7313. https://doi.org/10.1609/aaai.v38i7.28560

Issue

Section

AAAI Technical Track on Computer Vision VI