Disentangling for Transfer: Boosting Limited Modalities via Information-Theoretic Regularization and Cross-Modal Reconstruction

Authors

  • Zhiyun Zhang DAMO Academy, Alibaba Group Carnegie Mellon University
  • Yan-Jie Zhou DAMO Academy, Alibaba Group Hupan Laboratory College of Computer Science and Technology, Zhejiang University
  • Yujian Hu School of Medicine, Zhejiang University Department of Vascular Surgery, The First Affiliated Hospital of Zhejiang University School of Medicine
  • Xiyao Ma DAMO Academy, Alibaba Group Institute of Automation, Chinese Academy of Sciences
  • Zhouhang Yuan DAMO Academy, Alibaba Group School of Medicine, Zhejiang University College of Computer Science and Technology, Zhejiang University
  • Zirui Wang DAMO Academy, Alibaba Group Hupan Laboratory
  • Hongkun Zhang Department of Vascular Surgery, The First Affiliated Hospital of Zhejiang University School of Medicine
  • Minfeng Xu DAMO Academy, Alibaba Group Hupan Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i15.38305

Abstract

Missing critical modalities in medical imaging poses significant challenges for AI-driven diagnostic systems, particularly in scenarios where limited modalities must suffice for downstream tasks. Existing approaches often fail to fully leverage privileged features available only at training or address the information gap between privileged and limited modalities, resulting in suboptimal performance. To address this, we propose a unified, dual-stage Disentanglement-AligNmenT framEwork (DANTE), which uses InformationTheoretic Regularization and Cross-Modal Reconstruction to decompose full-modality information into alignable and privileged-exclusive components. In the first stage, a self-supervised pre-training strategy based on cross-modal reconstruction acts as a proxy task to implicitly incentivize disentangled representations. In the second stage, we present an information-theoretic regularization to explicitly maximize the transfer of privileged knowledge through two novel modules: (1) a Mutual Alignment Module that employs multilevel bidirectional alignment between limited-modality features and alignable features, enhancing cross-modal representation consistency; (2) a Privileged Compaction Module that restricts the privileged-exclusive information flow, promoting the integration of task-relevant content into alignable representations. Experimental results on three challenging medical datasets demonstrate that DANTE achieves state-of-the-art performance, demonstrating its effectiveness in leveraging privileged guidance under modality scarcity, and exhibits broad applicability across diverse medical imaging scenarios.

Downloads

Published

2026-03-14

How to Cite

Zhang, Z., Zhou, Y.-J., Hu, Y., Ma, X., Yuan, Z., Wang, Z., … Xu, M. (2026). Disentangling for Transfer: Boosting Limited Modalities via Information-Theoretic Regularization and Cross-Modal Reconstruction. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 13052–13060. https://doi.org/10.1609/aaai.v40i15.38305

Issue

Section

AAAI Technical Track on Computer Vision XII