Vision-Language Models Guided Graph Concept Reasoning for Interpretable Diabetic Retinopathy Diagnosis

Authors

  • Qihao Xu Shenzhen University Harbin Institute of Technology Shenzhen
  • Xiaoling Luo Shenzhen University
  • Yuxin Lin Harbin Institute of Technology Shenzhen
  • Chengliang Liu University of Macau
  • Yongting Hu Harbin Institute of Technology Shenzhen
  • Jinkai Li Chengdu University of Technology
  • Xinheng Lyu Shenzhen University The University of Nottingham Ningbo China
  • Yong Xu Harbin Institute of Technology Shenzhen

DOI:

https://doi.org/10.1609/aaai.v40i32.39948

Abstract

Deep neural networks (DNNs) have significantly advanced diabetic retinopathy (DR) diagnosis, yet their black-box nature limits clinical acceptance due to a lack of interpretability. Concept bottleneck model (CBM) offers a promising solution by enabling concept-level reasoning and test-time intervention, with recent DR studies modeling lesions as concepts and grades as outcomes. However, current methods often ignore relationships between lesion concepts across different DR grades and struggle when fine-grained lesion concepts are unavailable, limiting their interpretability and real-world applicability. To bridge these gaps, we propose VLM-GCR, a vision-language model guided graph concept reasoning framework for interpretable DR diagnosis. VLM-GCR emulates the diagnostic process of ophthalmologists by constructing a grading-aware lesion concept graph that explicitly models the interactions among lesions and their relationships to disease grades. In concept-free clinical scenarios, our method introduces a vision-language guided dynamic concept pseudo-labeling mechanism to mitigate the challenges of existing concept-based models in fine-grained lesion recognition. Additionally, we introduce a multi-level intervention method that supports error correction, enabling transparent and robust human-AI collaboration. Experiments on two public DR benchmarks show that VLM-GCR achieves strong performance in both lesion and grading tasks, while delivering clear and clinically meaningful reasoning steps.

Published

2026-03-14

How to Cite

Xu, Q., Luo, X., Lin, Y., Liu, C., Hu, Y., Li, J., … Xu, Y. (2026). Vision-Language Models Guided Graph Concept Reasoning for Interpretable Diabetic Retinopathy Diagnosis. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 27314–27322. https://doi.org/10.1609/aaai.v40i32.39948

Issue

Section

AAAI Technical Track on Machine Learning IX