LaTeX2Layout: High-Fidelity, Scalable Document Layout Annotation Pipeline for Layout Detection

Authors

  • Feijiang Han University of Pennsylvania
  • Zelong Wang University of Pennsylvania
  • Bowen Wang University of Pennsylvania
  • Xinxin Liu University of Pennsylvania
  • Skyler Cheung University of Pennsylvania
  • Delip Rao University of Pennsylvania
  • Chris Callison-Burch University of Pennsylvania
  • Lyle Ungar University of Pennsylvania

DOI:

https://doi.org/10.1609/aaai.v40i37.40349

Abstract

General-purpose Vision-Language Models (VLMs) are increasingly integral to modern AI systems for document understanding, yet their ability to perform fine-grained layout analysis remains severely underdeveloped. Overcoming this limitation requires large-scale, high-fidelity training datasets. However, current annotation methods that rely on parsing rendered PDFs are costly, error-prone, and difficult to scale. We propose a different paradigm: extracting ground-truth layout directly from the LaTeX compilation process rather than the final PDF. We present LaTeX2Layout, a generalizable procedural pipeline that recovers pixel-accurate bounding boxes and reading order from compiler traces. This enables the generation of a 140K-page dataset, including 120K programmatically generated synthetic variants that more than double the layout diversity of real-world data. Using this dataset, we fine-tune an efficient 3B-parameter VLM with an easy-to-hard curriculum that accelerates convergence. Our model achieves Kendall's tau=0.95 for reading order and mAP@50=0.91 for element grounding, delivering nearly 200% relative improvement over strong zero-shot baselines such as GPT-4o and Claude-3.7.

Downloads

Published

2026-03-14

How to Cite

Han, F., Wang, Z., Wang, B., Liu, X., Cheung, S., Rao, D., Callison-Burch, C., & Ungar, L. (2026). LaTeX2Layout: High-Fidelity, Scalable Document Layout Annotation Pipeline for Layout Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(37), 30907-30915. https://doi.org/10.1609/aaai.v40i37.40349

Issue

Section

AAAI Technical Track on Natural Language Processing II