PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network

Authors

  • Pengfei Wang Xidian University
  • Chengquan Zhang Baidu Inc.
  • Fei Qi Xidian University
  • Shanshan Liu Baidu Inc.
  • Xiaoqiang Zhang Baidu Inc.
  • Pengyuan Lyu Baidu Inc.
  • Junyu Han Baidu Inc.
  • Jingtuo Liu Baidu Inc.
  • Errui Ding Baidu Inc.
  • Guangming Shi Xidian University

DOI:

https://doi.org/10.1609/aaai.v35i4.16383

Keywords:

Language and Vision

Abstract

The reading of arbitrarily-shaped text has received increasing research attention. However, existing text spotters are mostly built on two-stage frameworks or character-based methods, which suffer from either Non-Maximum Suppression (NMS), Region-of-Interest (RoI) operations, or character-level annotations. In this paper, to address the above problems, we propose a novel fully convolutional Point Gathering Network (PGNet) for reading arbitrarily-shaped text in real-time. The PGNet is a single-shot text spotter, where the pixel-level character classification map is learned with proposed PG-CTC loss avoiding the usage of character-level annotations. With a PG-CTC decoder, we gather high-level character classification vectors from two-dimensional space and decode them into text symbols without NMS and RoI operations involved, which guarantees high efficiency. Additionally, reasoning the relations between each character and its neighbors, a graph refinement module (GRM) is proposed to optimize the coarse recognition and improve the end-to-end performance. Experiments prove that the proposed method achieves competitive accuracy, meanwhile significantly improving the running speed. In particular, in Total-Text, it runs at 46.7 FPS, surpassing the previous spotters with a large margin.

Downloads

Published

2021-05-18

How to Cite

Wang, P., Zhang, C., Qi, F., Liu, S., Zhang, X., Lyu, P., Han, J., Liu, J., Ding, E., & Shi, G. (2021). PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 2782-2790. https://doi.org/10.1609/aaai.v35i4.16383

Issue

Section

AAAI Technical Track on Computer Vision III