Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning

Wenchuan Zhang; Jingru Guo; Hengzhe Zhang; Penghao Zhang; Jie Chen; Shuwan Zhang; Zhang Zhang; Yuhao Yi; Hong Bu

doi:10.1609/aaai.v40i35.40239

Authors

Wenchuan Zhang Department of Pathology, West China Hospital, Sichuan University Institute of Clinical Pathology, West China Hospital, Sichuan University
Jingru Guo University of Toronto
Hengzhe Zhang School of Engineering and Computer Science, Victoria University of Wellington
Penghao Zhang Independent Researcher
Jie Chen Institute of Clinical Pathology, West China Hospital, Sichuan University
Shuwan Zhang Department of Pathology, Shengjing Hospital of China Medical University
Zhang Zhang Department of Pathology, West China Hospital, Sichuan University
Yuhao Yi Department of Pathology, West China Hospital, Sichuan University Institute of Clinical Pathology, West China Hospital, Sichuan University
Hong Bu Department of Pathology, West China Hospital, Sichuan University Institute of Clinical Pathology, West China Hospital, Sichuan University

DOI:

https://doi.org/10.1609/aaai.v40i35.40239

Abstract

Although Vision Language Models (VLMs) have shown generalization in medical imaging, pathology presents unique challenges due to ultra-high resolution, complex tissue structures, and nuanced semantics. These factors make pathology VLMs prone to hallucinations, i.e., generating outputs inconsistent with visual evidence, which undermines clinical trust. Existing RAG approaches in this domain largely depend on text-based knowledge bases, limiting their ability to leverage diagnostic visual cues. To address this, we propose Patho-AgenticRAG, a multimodal RAG framework with a database built on page-level embeddings from authoritative pathology textbooks. Unlike traditional text-only retrieval systems, it supports joint text–image search, enabling retrieval of textbook pages that contain both the queried text and relevant visual cues, thus avoiding the loss of critical image-based information. Patho-AgenticRAG also supports reasoning, task decomposition, and multi-turn search interactions, improving accuracy in complex diagnostic scenarios. Experiments show that Patho-AgenticRAG significantly outperforms existing multimodal models in complex pathology tasks like multiple-choice diagnosis and visual question answering.

Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information