Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

Authors

  • Hao Liu Tencent YouTu Lab
  • Xin Li Tencent YouTu Lab
  • Mingming Gong Tencent YouTu Lab
  • Bing Liu Tencent YouTu Lab
  • Yunfei Wu Tencent YouTu Lab
  • Deqiang Jiang Tencent YouTu Lab
  • Yinsong Liu Tencent YouTu Lab
  • Xing Sun Tencent YouTu Lab

DOI:

https://doi.org/10.1609/aaai.v38i4.28149

Keywords:

CV: Applications, CV: Language and Vision, CV: Multi-modal Vision

Abstract

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perform well on unregularized table cases distracted by not only complicated inner structure but also exterior capture distortion. In this paper, we raise it as Complex TSR problem, where the performance degeneration of existing methods is attributable to their inefficient component usage and redundant post-processing. To mitigate it, we shift our perspective from table component extraction towards the efficient multiple components leverage, which awaits further exploration in the field. Specifically, we propose a seminal method, termed GrabTab, equipped with newly proposed Component Deliberator, to handle various types of tables in a unified framework. Thanks to its progressive deliberation mechanism, our GrabTab can flexibly accommodate to most complex tables with reasonable components selected but without complicated post-processing involved. Quantitative experimental results on public benchmarks demonstrate that our method significantly outperforms the state-of-the-arts, especially under more challenging scenes.

Published

2024-03-24

How to Cite

Liu, H., Li, X., Gong, M., Liu, B., Wu, Y., Jiang, D., … Sun, X. (2024). Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3603–3611. https://doi.org/10.1609/aaai.v38i4.28149

Issue

Section

AAAI Technical Track on Computer Vision III