MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging

Authors

  • Shufeng Kong School of Software Engineering, Sun Yat-sen University, Zhuhai, China Department of Computer Science, Cornell University, Ithaca, NY, USA
  • Zijie Wang School of Software Engineering, Sun Yat-sen University, Zhuhai, China
  • Nuan Cui Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China
  • Hao Tang Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China
  • Yihan Meng Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China
  • Yuanyuan Wei School of Software Engineering, Sun Yat-sen University, Zhuhai, China
  • Feifan Chen School of Software Engineering, Sun Yat-sen University, Zhuhai, China
  • Yingheng Wang Department of Computer Science, Cornell University, Ithaca, NY, USA
  • Zhuo Cai Merchants Union Consumer Finance Company Limited (MUCFC), Shenzhen, China
  • Yaonan Wang Merchants Union Consumer Finance Company Limited (MUCFC), Shenzhen, China
  • Yulong Zhang The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
  • Yuzheng Li School of Software Engineering, Sun Yat-sen University, Zhuhai, China
  • Zibin Zheng School of Software Engineering, Sun Yat-sen University, Zhuhai, China
  • Caihua Liu School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin, China Department of Computer Science, Cornell University, Ithaca, NY, USA
  • Hao Liang Institute of TCM Diagnostics, Hunan University of Chinese Medicine, Changsha, China

DOI:

https://doi.org/10.1609/aaai.v40i45.41218

Abstract

Automated interpretation of medical images demands robust modeling of complex visual-semantic relationships while addressing annotation scarcity, label imbalance, and clinical plausibility constraints. We introduce MIRNet (Medical Image Reasoner Network), a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning. Tongue image diagnosis is a particularly challenging domain that requires fine-grained visual and semantic understanding. Our approach leverages self-supervised masked autoencoder (MAE) to learn transferable visual representations from unlabeled data; employs graph attention networks (GAT) to model label correlations through expert-defined structured graphs; enforces clinical priors via constraint-aware optimization using KL divergence and regularization losses; and mitigates imbalance using asymmetric loss (ASL) and boosting ensembles. To address annotation scarcity, we also introduce TongueAtlas-4K, a comprehensive expert-curated benchmark comprising 4,000 images annotated with 22 diagnostic labels–representing the largest public dataset in tongue analysis. Validation shows our method achieves state-of-the-art performance. While optimized for tongue diagnosis, the framework readily generalizes to broader diagnostic medical imaging tasks.

Downloads

Published

2026-03-14

How to Cite

Kong, S., Wang, Z., Cui, N., Tang, H., Meng, Y., Wei, Y., … Liang, H. (2026). MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging. Proceedings of the AAAI Conference on Artificial Intelligence, 40(45), 38746–38753. https://doi.org/10.1609/aaai.v40i45.41218

Issue

Section

AAAI Special Track on AI for Social Impact I