Layout-Aware Document Parsing with Visual-Linguistic Fusion: The DATA-LUX with Academic Content Service Provider

Authors

  • Min Chan Kim Kyung Hee University
  • Yeonkyung Kim Kyung Hee University
  • Jae Won Lee ALLBIGDAT
  • Ki Hwan Kim ALLBIGDAT
  • Ji Woo Kwak ALLBIGDAT
  • Jae Hong Park Kyung Hee University

DOI:

https://doi.org/10.1609/aaai.v40i47.41438

Abstract

Many organizations are increasingly relying on unstructured documents such as PDFs and scanned forms to support downstream large language model (LLM) services, including search, summarization, and recommendation. However, traditional OCR systems struggle with diverse layouts of documents, leading to frequent errors and high costs of labor. So, this study developed DATALUX - a robust document layout system that trans-forms unstructured documents into structured, machine-readable data suitable for automation. Built on a trans-former-based detector, DATALUX incorporates several modules for layout refinement, text-visual fusion, and layer-wise optimization to improve coherence and generalization across diverse layouts. Around January 2025, we successfully deployed DATALUX into one of the largest academic content service firms (Nurimedia) in South Korea. This firm faced the challenge of extracting metadata and references from thousands of academic pa-pers submitted in various formats. Also, the existing LLM-based tools provided unreliable results. So, they needed to process them manually, creating bottlenecks in both labor and time. However, DATALUX enabled the automatic structuring of over 100,000 research papers a year, improving extraction accuracy to over 97%, reducing costs by more than USD 185K annually, and accelerating processing speed by 8.7 times. These deployment results suggest that DATALUX enables scalable and efficient document automation in complex and high-volume environments successfully. We thus believe that our DATALUX has a significant impact on both academia and industry practices.

Published

2026-03-14

How to Cite

Kim, M. C., Kim, Y., Lee, J. W., Kim, K. H., Kwak, J. W., & Park, J. H. (2026). Layout-Aware Document Parsing with Visual-Linguistic Fusion: The DATA-LUX with Academic Content Service Provider. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 40035–40044. https://doi.org/10.1609/aaai.v40i47.41438

Issue

Section

IAAI Technical Track on Deployed Highly Innovative Applications of AI