MuDoC: An Interactive Multimodal Document-grounded Conversational AI System

Karan Taneja; Ashok K. Goel

doi:10.1609/aaaiss.v5i1.35619

MuDoC: An Interactive Multimodal Document-grounded Conversational AI System

Authors

Karan Taneja Georgia Institute of Technology
Ashok K. Goel Georgia Institute of Technology

DOI:

https://doi.org/10.1609/aaaiss.v5i1.35619

Abstract

Multimodal AI is an important step towards building effective tools to leverage multiple modalities in human-AI communication. Building a multimodal document-grounded AI system to interact with long documents remains a challenge. Our work aims to fill the research gap of directly leveraging grounded visuals from documents alongside textual content in documents for response generation. We present an interactive conversational AI agent 'MuDoC' based on GPT-4o to generate document-grounded responses with interleaved text and figures. MuDoC's intelligent textbook interface promotes trustworthiness and enables verification of system responses by allowing instant navigation to source text and figures in the documents. We also discuss qualitative observations based on MuDoC responses highlighting its strengths and limitations.

AAAI Summer Symposium 2025 Proceedings Cover

Downloads

Published

2025-05-28

How to Cite

Taneja, K., & Goel, A. K. (2025). MuDoC: An Interactive Multimodal Document-grounded Conversational AI System. Proceedings of the AAAI Symposium Series, 5(1), 394–398. https://doi.org/10.1609/aaaiss.v5i1.35619

Download Citation

Issue

Vol. 5 No. 1: Proceedings of the 2025 AAAI Spring Symposium Series

Section

Machine Learning and Knowledge Engineering for Trustworthy Multimodal and Generative AI (Position Papers)

MuDoC: An Interactive Multimodal Document-grounded Conversational AI System

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information