MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging

Shufeng Kong; Zijie Wang; Nuan Cui; Hao Tang; Yihan Meng; Yuanyuan Wei; Feifan Chen; Yingheng Wang; Zhuo Cai; Yaonan Wang; Yulong Zhang; Yuzheng Li; Zibin Zheng; Caihua Liu; Hao Liang

doi:10.1609/aaai.v40i45.41218

Authors

Shufeng Kong School of Software Engineering, Sun Yat-sen University, Zhuhai, China Department of Computer Science, Cornell University, Ithaca, NY, USA
Zijie Wang School of Software Engineering, Sun Yat-sen University, Zhuhai, China
Nuan Cui Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China
Hao Tang Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China
Yihan Meng Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China
Yuanyuan Wei School of Software Engineering, Sun Yat-sen University, Zhuhai, China
Feifan Chen School of Software Engineering, Sun Yat-sen University, Zhuhai, China
Yingheng Wang Department of Computer Science, Cornell University, Ithaca, NY, USA
Zhuo Cai Merchants Union Consumer Finance Company Limited (MUCFC), Shenzhen, China
Yaonan Wang Merchants Union Consumer Finance Company Limited (MUCFC), Shenzhen, China
Yulong Zhang The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
Yuzheng Li School of Software Engineering, Sun Yat-sen University, Zhuhai, China
Zibin Zheng School of Software Engineering, Sun Yat-sen University, Zhuhai, China
Caihua Liu School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin, China Department of Computer Science, Cornell University, Ithaca, NY, USA
Hao Liang Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China

DOI:

https://doi.org/10.1609/aaai.v40i45.41218

Abstract

Automated interpretation of medical images demands robust modeling of complex visual-semantic relationships while addressing annotation scarcity, label imbalance, and clinical plausibility constraints. We introduce MIRNet (Medical Image Reasoner Network), a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning. Tongue image diagnosis is a particularly challenging domain that requires fine-grained visual and semantic understanding. Our approach leverages self-supervised masked autoencoder (MAE) to learn transferable visual representations from unlabeled data; employs graph attention networks (GAT) to model label correlations through expert-defined structured graphs; enforces clinical priors via constraint-aware optimization using KL divergence and regularization losses; and mitigates imbalance using asymmetric loss (ASL) and boosting ensembles. To address annotation scarcity, we also introduce TongueAtlas-4K, a comprehensive expert-curated benchmark comprising 4,000 images annotated with 22 diagnostic labels–representing the largest public dataset in tongue analysis. Validation shows our method achieves state-of-the-art performance. While optimized for tongue diagnosis, the framework readily generalizes to broader diagnostic medical imaging tasks.

MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information