DocMamba: Efficient Document Pre-training with State Space Model

Pengfei Hu; Zhenrong Zhang; Jiefeng Ma; Shuhang Liu; Jun Du; Jianshu Zhang

doi:10.1609/aaai.v39i22.34584

Authors

Pengfei Hu NERC-SLIP, University of Science and Technology of China
Zhenrong Zhang NERC-SLIP, University of Science and Technology of China iFLYTEK Research
Jiefeng Ma NERC-SLIP, University of Science and Technology of China
Shuhang Liu NERC-SLIP, University of Science and Technology of China
Jun Du NERC-SLIP, University of Science and Technology of China
Jianshu Zhang iFLYTEK Research

DOI:

https://doi.org/10.1609/aaai.v39i22.34584

Abstract

In recent years, visually-rich document understanding has attracted increasing attention. Transformer-based pre-trained models have become the mainstream approach, yielding significant performance gains in this field. However, the self-attention mechanism's quadratic computational complexity hinders their efficiency and ability to process long documents. In this paper, we present DocMamba, a novel framework based on the state space model. It is designed to reduce computational complexity to linear while preserving global modeling capabilities. To further enhance its effectiveness in document processing, we introduce the Segment-First Bidirectional Scan (SFBS) to capture contiguous semantic information. Experimental results demonstrate that DocMamba achieves new state-of-the-art results on downstream datasets such as FUNSD, CORD, and SORIE, while significantly improving speed and reducing memory usage. Notably, experiments on the HRDoc confirm DocMamba's potential for length extrapolation.

DocMamba: Efficient Document Pre-training with State Space Model

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information