LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network

Authors

  • Yuchen Su Shanghai Collaborative Innovation Center of Intelligent Visual Computing, School of Computer Science, Fudan University Baidu Inc.
  • Zhineng Chen Shanghai Collaborative Innovation Center of Intelligent Visual Computing, School of Computer Science, Fudan University
  • Zhiwen Shao China University of Mining and Technology
  • Yuning Du Baidu Inc.
  • Zhilong Ji Tomorrow Advancing Life
  • Jinfeng Bai Tomorrow Advancing Life
  • Yong Zhou China University of Mining and Technology
  • Yu-Gang Jiang Shanghai Collaborative Innovation Center of Intelligent Visual Computing, School of Computer Science, Fudan University

DOI:

https://doi.org/10.1609/aaai.v38i5.28302

Keywords:

CV: Scene Analysis & Understanding, CV: Object Detection & Categorization

Abstract

Recently, regression-based methods, which predict parameterized text shapes for text localization, have gained popularity in scene text detection. However, the existing parameterized text shape methods still have limitations in modeling arbitrary-shaped texts due to ignoring the utilization of text-specific shape information. Moreover, the time consumption of the entire pipeline has been largely overlooked, leading to a suboptimal overall inference speed. To address these issues, we first propose a novel parameterized text shape method based on low-rank approximation. Unlike other shape representation methods that employ data-irrelevant parameterization, our approach utilizes singular value decomposition and reconstructs the text shape using a few eigenvectors learned from labeled text contours. By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation. Next, we propose a dual assignment scheme for speed acceleration. It adopts a sparse assignment branch to accelerate the inference speed, and meanwhile, provides ample supervised signals for training through a dense assignment branch. Building upon these designs, we implement an accurate and efficient arbitrary-shaped text detector named LRANet. Extensive experiments are conducted on several challenging benchmarks, demonstrating the superior accuracy and efficiency of LRANet compared to state-of-the-art methods. Code is available at: https://github.com/ychensu/LRANet.git

Published

2024-03-24

How to Cite

Su, Y., Chen, Z., Shao, Z., Du, Y., Ji, Z., Bai, J., Zhou, Y., & Jiang, Y.-G. (2024). LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4979-4987. https://doi.org/10.1609/aaai.v38i5.28302

Issue

Section

AAAI Technical Track on Computer Vision IV