Event-Guided Scene Text Image Super-Resolution

Zihan Qi; Zeyu Xiao; Haoyi Zhao; Yang Zhao; Feng Xue; Wei Jia

doi:10.1609/aaai.v40i10.37801

Authors

Zihan Qi Hefei University of Technology
Zeyu Xiao National University of Singapore
Haoyi Zhao Hefei University of Technology
Yang Zhao Hefei University of Technology
Feng Xue Hefei University of Technology
Wei Jia Hefei University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i10.37801

Abstract

Scene text image super-resolution aims to enhance text legibility by recovering high-resolution text images from low-resolution inputs. However, maintaining fine details such as text strokes, edges, and textual accuracy remains challenging, particularly in low-light environments and high-speed motion scenarios, where degradation is more severe. Event cameras, with their high temporal resolution and ability to capture intensity changes, offer a promising solution for restoring lost fine details and mitigating degradation in these challenging conditions. In this paper, we propose EvTSR, the first framework that integrates Event data for scene Text image Super-Resolution. The core of EvTSR is the dual-stream frequency boost (DSFB) mechanism, which separates image features into high- and low-frequency components. High-frequency details like edges and strokes are enhanced using event data via the event-guided high-frequency (EGH) mechanism, while low-frequency components, responsible for global structure, are refined using the Text-Guided Low-frequency (TGL) mechanism with a pre-trained text recognizer, ensuring textual coherence. To further improve cross-modal integration, we introduce the cross-modal fusion (CMF) mechanism, which effectively aligns event and image features, enabling robust information fusion. Extensive experiments demonstrate that EvTSR achieves superior performance over existing methods.

Event-Guided Scene Text Image Super-Resolution

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information