Like an Ophthalmologist: Dynamic Selection Driven Multi-View Learning for Diabetic Retinopathy Grading

Authors

  • Xiaoling Luo Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
  • Qihao Xu Shenzhen Key Laboratory of Visual Object Detection and Recognition, Harbin Institute of Technology, Shenzhen, China
  • Huisi Wu Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
  • Chengliang Liu Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China
  • Zhihui Lai Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
  • Linlin Shen Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, China Guangdong Provincial Key Laboratory of Intelligent Information Processing, China

DOI:

https://doi.org/10.1609/aaai.v39i18.34116

Abstract

Diabetic retinopathy (DR), with its large patient population, has become a formidable threat to human visual health. In the clinical diagnosis of DR, multi-view fundus images are considered to be more suitable for DR diagnosis because of the wide coverage of the field of view. Therefore, different from most of the previous single-view DR grading methods, we design a dynamic selection-driven multi-view DR grading method to fit clinical scenarios better. Since lesion information plays a key role in DR diagnosis, previous methods usually boost the model performance by enhancing the lesion feature. However, during the actual diagnosis, ophthalmologists not only focus on the crucial parts, but also exclude irrelevant features to ensure the accuracy of judgment. To this end, we introduce the idea of dynamic selection and design a series of selection mechanisms from fine granularity to coarse granularity. In this work, we first introduce an Ophthalmic Image Reader (OIR) agent to provide the model with pixel-level prompts of suspected lesion areas. Moreover, a Multi-View Token Selection Module (MVTSM) is designed to prune redundant feature tokens and realize dynamic selection of key information. In the final decision stage, we dynamically fuse multi-view features through the novel Multi-View Mixture of Experts Module (MVMoEM), to enhance key views and reduce the impact of conflicting views. Extensive experiments on a large multi-view fundus image dataset with 34,452 images demonstrate that our method performs favorably against state-of-the-art models.

Published

2025-04-11

How to Cite

Luo, X., Xu, Q., Wu, H., Liu, C., Lai, Z., & Shen, L. (2025). Like an Ophthalmologist: Dynamic Selection Driven Multi-View Learning for Diabetic Retinopathy Grading. Proceedings of the AAAI Conference on Artificial Intelligence, 39(18), 19224–19232. https://doi.org/10.1609/aaai.v39i18.34116

Issue

Section

AAAI Technical Track on Machine Learning IV