Event-Guided Scene Text Image Super-Resolution

Authors

  • Zihan Qi Hefei University of Technology
  • Zeyu Xiao National University of Singapore
  • Haoyi Zhao Hefei University of Technology
  • Yang Zhao Hefei University of Technology
  • Feng Xue Hefei University of Technology
  • Wei Jia Hefei University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i10.37801

Abstract

Scene text image super-resolution aims to enhance text legibility by recovering high-resolution text images from low-resolution inputs. However, maintaining fine details such as text strokes, edges, and textual accuracy remains challenging, particularly in low-light environments and high-speed motion scenarios, where degradation is more severe. Event cameras, with their high temporal resolution and ability to capture intensity changes, offer a promising solution for restoring lost fine details and mitigating degradation in these challenging conditions. In this paper, we propose EvTSR, the first framework that integrates Event data for scene Text image Super-Resolution. The core of EvTSR is the dual-stream frequency boost (DSFB) mechanism, which separates image features into high- and low-frequency components. High-frequency details like edges and strokes are enhanced using event data via the event-guided high-frequency (EGH) mechanism, while low-frequency components, responsible for global structure, are refined using the Text-Guided Low-frequency (TGL) mechanism with a pre-trained text recognizer, ensuring textual coherence. To further improve cross-modal integration, we introduce the cross-modal fusion (CMF) mechanism, which effectively aligns event and image features, enabling robust information fusion. Extensive experiments demonstrate that EvTSR achieves superior performance over existing methods.

Downloads

Published

2026-03-14

How to Cite

Qi, Z., Xiao, Z., Zhao, H., Zhao, Y., Xue, F., & Jia, W. (2026). Event-Guided Scene Text Image Super-Resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 40(10), 8502–8510. https://doi.org/10.1609/aaai.v40i10.37801

Issue

Section

AAAI Technical Track on Computer Vision VII