Deep Hierarchies and Invariant Disease-Indicative Feature Learning for Computer Aided Diagnosis of Multiple Fundus Diseases

Authors

  • Yuxin Lin School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
  • Wei Wang School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
  • Xiaoling Luo College of Computer Science and Software Engineering, Shenzhen University
  • Zhihao Wu School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
  • Chengliang Liu Computer Science and Engineering, Hong Kong University of Science and Technology
  • Jie Wen School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
  • Yong Xu School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen

DOI:

https://doi.org/10.1609/aaai.v39i5.32566

Abstract

With the advancement of computer vision, numerous models have been proposed for screening of fundus diseases. However, the recognition of multiple fundus diseases is often hampered by the simultaneous presence of multiple disease types and the confluence of lesion types in fundus images. This paper addresses these challenges by conceptualizing them as multi-level feature fusion and self-supervised disease-indicative feature learning problems. We decode fundus images at various levels of granularity to delineate scenarios wherein multiple diseases and lesions co-occur. To effectively integrate these features, we introduce a hierarchical vision transformer (HVT) that adeptly captures both inter-level and intra-level dependencies. A novel forward-attention module is proposed to enhance the integration of lower-level semantic information into higher semantic layers, thereby enriching the representation of complex features. Additionally, we introduce a novel self-supervised mask-consistent feature learner (MCFL). Unlike traditional mask-autoencoders that reconstruct original images using encoder-decoder structures, MCFL utilizes a teacher-student framework to reconstruct mask-consistent feature maps. In this setup, exponential moving averaging is employed to derive classification-guided features, serving as labels for reconstruction rather than merely reconstructing the original images. This innovative approach facilitates the extraction of disease-indicative features. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art models.

Downloads

Published

2025-04-11

How to Cite

Lin, Y., Wang, W., Luo, X., Wu, Z., Liu, C., Wen, J., & Xu, Y. (2025). Deep Hierarchies and Invariant Disease-Indicative Feature Learning for Computer Aided Diagnosis of Multiple Fundus Diseases. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 5325–5333. https://doi.org/10.1609/aaai.v39i5.32566

Issue

Section

AAAI Technical Track on Computer Vision IV