Knowledge-Enhanced Historical Document Segmentation and Recognition
DOI:
https://doi.org/10.1609/aaai.v38i8.28683Keywords:
DMKM: Mining of Visual, Multimedia & Multimodal Data, CV: Learning & Optimization for CV, CV: Visual Reasoning & Symbolic Representations, KRR: ApplicationsAbstract
Optical Character Recognition (OCR) of historical document images remains a challenging task because of the distorted input images, extensive number of uncommon characters, and the scarcity of labeled data, which impedes modern deep learning-based OCR techniques from achieving good recognition accuracy. Meanwhile, there exists a substantial amount of expert knowledge that can be utilized in this task. However, such knowledge is usually complicated and could only be accurately expressed with formal languages such as first-order logic (FOL), which is difficult to be directly integrated into deep learning models. This paper proposes KESAR, a novel Knowledge-Enhanced Document Segmentation And Recognition method for historical document images based on the Abductive Learning (ABL) framework. The segmentation and recognition models are enhanced by incorporating background knowledge for character extraction and prediction, followed by an efficient joint optimization of both models. We validate the effectiveness of KESAR on historical document datasets. The experimental results demonstrate that our method can simultaneously utilize knowledge-driven reasoning and data-driven learning, which outperforms the current state-of-the-art methods.Downloads
Published
2024-03-24
How to Cite
Gao, E.-H., Huang, Y.-X., Hu, W.-C., Zhu, X.-H., & Dai, W.-Z. (2024). Knowledge-Enhanced Historical Document Segmentation and Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 8409-8416. https://doi.org/10.1609/aaai.v38i8.28683
Issue
Section
AAAI Technical Track on Data Mining & Knowledge Management