Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis

Zihao Zhao; Sheng Wang; Qian Wang; Dinggang Shen

doi:10.1609/aaai.v38i7.28586

Authors

Zihao Zhao School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China
Sheng Wang School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Qian Wang School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China Shanghai Clinical Research and Trial Center, Shanghai, China
Dinggang Shen School of Biomedical Engineering & State Key Laboratory of Advanced Medical Materials and Devices, ShanghaiTech University, Shanghai, China Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China Shanghai Clinical Research and Trial Center, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28586

Keywords:

CV: Representation Learning for Vision, DMKM: Applications

Abstract

Obtaining large-scale radiology reports can be difficult for medical images due to ethical concerns, limiting the effectiveness of contrastive pre-training in the medical image domain and underscoring the need for alternative methods. In this paper, we propose eye-tracking as an alternative to text reports, as it allows for the passive collection of gaze signals without ethical issues. By tracking the gaze of radiologists as they read and diagnose medical images, we can understand their visual attention and clinical reasoning. When a radiologist has similar gazes for two medical images, it may indicate semantic similarity for diagnosis, and these images should be treated as positive pairs when pre-training a computer-assisted diagnosis (CAD) network through contrastive learning. Accordingly, we introduce the Medical contrastive Gaze Image Pre-training (McGIP) as a plug-and-play module for contrastive learning frameworks. McGIP uses radiologist gaze to guide contrastive pre-training. We evaluate our method using two representative types of medical images and two common types of gaze data. The experimental results demonstrate the practicality of McGIP, indicating its high potential for various clinical scenarios and applications.

Mining Gaze for Contrastive Learning toward Computer-Assisted Diagnosis

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription