DocParser: Hierarchical Document Structure Parsing from Renderings

Johannes Rausch; Octavio Martinez; Fabian Bissig; Ce Zhang; Stefan Feuerriegel

doi:10.1609/aaai.v35i5.16558

Authors

Johannes Rausch Department of Computer Science, ETH Zurich
Octavio Martinez Department of Computer Science, ETH Zurich
Fabian Bissig Department of Computer Science, ETH Zurich
Ce Zhang Department of Computer Science, ETH Zurich
Stefan Feuerriegel Department of Management, Technology, and Economics, ETH Zurich

DOI:

https://doi.org/10.1609/aaai.v35i5.16558

Keywords:

Applications, Information Extraction

Abstract

Translating renderings (e. g. PDFs, scans) into hierarchical document structures is extensively demanded in the daily routines of many real-world applications. However, a holistic, principled approach to inferring the complete hierarchical structure in documents is missing. As a remedy, we developed “DocParser”: an end-to-end system for parsing complete document structure – including all text elements, nested figures, tables, and table cell structures. Our second contribution is to provide a dataset for evaluating hierarchical document structure parsing. Our third contribution is to propose a scalable learning framework for settings where domain-specific data are scarce, which we address by a novel approach to weak supervision that significantly improves the document structure parsing performance. Our experiments confirm the effectiveness of our proposed weak supervision: Compared to the baseline without weak supervision, it improves the mean average precision for detecting document entities by 39.1% and improves the F1 score of classifying hierarchical relations by 35.8%.

DocParser: Hierarchical Document Structure Parsing from Renderings

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription