Vision-Language Models Guided Graph Concept Reasoning for Interpretable Diabetic Retinopathy Diagnosis

Qihao Xu; Xiaoling Luo; Yuxin Lin; Chengliang Liu; Yongting Hu; Jinkai Li; Xinheng Lyu; Yong Xu

doi:10.1609/aaai.v40i32.39948

Authors

Qihao Xu Shenzhen University Harbin Institute of Technology Shenzhen
Xiaoling Luo Shenzhen University
Yuxin Lin Harbin Institute of Technology Shenzhen
Chengliang Liu University of Macau
Yongting Hu Harbin Institute of Technology Shenzhen
Jinkai Li Chengdu University of Technology
Xinheng Lyu Shenzhen University The University of Nottingham Ningbo China
Yong Xu Harbin Institute of Technology Shenzhen

DOI:

https://doi.org/10.1609/aaai.v40i32.39948

Abstract

Deep neural networks (DNNs) have significantly advanced diabetic retinopathy (DR) diagnosis, yet their black-box nature limits clinical acceptance due to a lack of interpretability. Concept bottleneck model (CBM) offers a promising solution by enabling concept-level reasoning and test-time intervention, with recent DR studies modeling lesions as concepts and grades as outcomes. However, current methods often ignore relationships between lesion concepts across different DR grades and struggle when fine-grained lesion concepts are unavailable, limiting their interpretability and real-world applicability. To bridge these gaps, we propose VLM-GCR, a vision-language model guided graph concept reasoning framework for interpretable DR diagnosis. VLM-GCR emulates the diagnostic process of ophthalmologists by constructing a grading-aware lesion concept graph that explicitly models the interactions among lesions and their relationships to disease grades. In concept-free clinical scenarios, our method introduces a vision-language guided dynamic concept pseudo-labeling mechanism to mitigate the challenges of existing concept-based models in fine-grained lesion recognition. Additionally, we introduce a multi-level intervention method that supports error correction, enabling transparent and robust human-AI collaboration. Experiments on two public DR benchmarks show that VLM-GCR achieves strong performance in both lesion and grading tasks, while delivering clear and clinically meaningful reasoning steps.

Vision-Language Models Guided Graph Concept Reasoning for Interpretable Diabetic Retinopathy Diagnosis

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information