Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis

Authors

  • Zihao Zhao School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
  • Sheng Wang School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
  • Qian Wang School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China Shanghai Clinical Research and Trial Center, Shanghai, China
  • Dinggang Shen School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China Shanghai Clinical Research and Trial Center, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28586

Keywords:

CV: Representation Learning for Vision, DMKM: Applications

Abstract

Obtaining large-scale radiology reports can be difficult for medical images due to ethical concerns, limiting the effectiveness of contrastive pre-training in the medical image domain and underscoring the need for alternative methods. In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without ethical issues. By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning. When a radiologist has similar gazes for two medical images, it may indicate semantic similarity for diagnosis, and these images should be treated as positive pairs when pre-training a computer-assisted diagnosis (CAD) network through contrastive learning. Accordingly, we introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks. McGIP uses radiologist gaze to guide contrastive pre-training. We evaluate our method using two representative types of medical images and two common types of gaze data. The experimental results demonstrate the practicality of McGIP, indicating its high potential for various clinical scenarios and applications.

Published

2024-03-24

How to Cite

Zhao, Z., Wang, S., Wang, Q., & Shen, D. (2024). Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7543-7551. https://doi.org/10.1609/aaai.v38i7.28586

Issue

Section

AAAI Technical Track on Computer Vision VI