Knowledge-Enhanced Historical Document Segmentation and Recognition

Authors

  • En-Hao Gao Nanjing University
  • Yu-Xuan Huang Nanjing University
  • Wen-Chao Hu Nanjing University
  • Xin-Hao Zhu Nanjing University
  • Wang-Zhou Dai Nanjing University

DOI:

https://doi.org/10.1609/aaai.v38i8.28683

Keywords:

DMKM: Mining of Visual, Multimedia & Multimodal Data, CV: Learning & Optimization for CV, CV: Visual Reasoning & Symbolic Representations, KRR: Applications

Abstract

Optical Character Recognition (OCR) of historical document images remains a challenging task because of the distorted input images, extensive number of uncommon characters, and the scarcity of labeled data, which impedes modern deep learning-based OCR techniques from achieving good recognition accuracy. Meanwhile, there exists a substantial amount of expert knowledge that can be utilized in this task. However, such knowledge is usually complicated and could only be accurately expressed with formal languages such as first-order logic (FOL), which is difficult to be directly integrated into deep learning models. This paper proposes KESAR, a novel Knowledge-Enhanced Document Segmentation And Recognition method for historical document images based on the Abductive Learning (ABL) framework. The segmentation and recognition models are enhanced by incorporating background knowledge for character extraction and prediction, followed by an efficient joint optimization of both models. We validate the effectiveness of KESAR on historical document datasets. The experimental results demonstrate that our method can simultaneously utilize knowledge-driven reasoning and data-driven learning, which outperforms the current state-of-the-art methods.

Downloads

Published

2024-03-24

How to Cite

Gao, E.-H., Huang, Y.-X., Hu, W.-C., Zhu, X.-H., & Dai, W.-Z. (2024). Knowledge-Enhanced Historical Document Segmentation and Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 8409-8416. https://doi.org/10.1609/aaai.v38i8.28683

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management