Deep Multi-Task Learning for Diabetic Retinopathy Grading in Fundus Images


  • Xiaofei Wang Beihang University
  • Mai Xu Beihang University
  • Jicong Zhang Beihang University
  • Lai Jiang Beihang University
  • Liu Li Imperial College London


Applications, Healthcare, Medicine & Wellness


Recent years have witnessed the growing interest in disease severity grading, especially for ocular diseases based on fundus images. The existing grading methods are usually trained with high resolution (HR) images. However, the grading performance decreases a lot given low resolution (LR) images, which are common in practice. In this paper, we mainly focus on diabetic retinopathy (DR) grading with LR fundus images. According to our analysis on the DR task, we find that: 1) image super-resolution (ISR) can boost the performance of DR grading and lesion segmentation; 2) the lesion segmentation regions of fundus images are highly consistent with pathological regions for DR grading. Thus, we propose a deep multi-task learning based DR grading (DeepMT-DR) method for LR fundus images, which simultaneously handles the auxiliary tasks of ISR and lesion segmentation. Specifically, based on our findings, we propose a hierarchical deep learning structure that simultaneously processes the low-level task of ISR, the mid-level task of lesion segmentation and the high-level task of DR grading. Moreover, a novel task-aware loss is developed to encourage ISR to focus on the pathological regions for its subsequent tasks: lesion segmentation and DR grading. Extensive experimental results show that our DeepMT-DR method significantly outperforms other state-of-the-art methods for DR grading over two public datasets. In addition, our method achieves comparable performance in two auxiliary tasks of ISR and lesion segmentation.




How to Cite

Wang, X., Xu, M., Zhang, J., Jiang, L., & Li, L. (2021). Deep Multi-Task Learning for Diabetic Retinopathy Grading in Fundus Images. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 2826-2834. Retrieved from



AAAI Technical Track on Computer Vision III