Disentangling for Transfer: Boosting Limited Modalities via Information-Theoretic Regularization and Cross-Modal Reconstruction

Zhiyun Zhang; Yan-Jie Zhou; Yujian Hu; Xiyao Ma; Zhouhang Yuan; Zirui Wang; Hongkun Zhang; Minfeng Xu

doi:10.1609/aaai.v40i15.38305

Authors

Zhiyun Zhang DAMO Academy, Alibaba Group Carnegie Mellon University
Yan-Jie Zhou DAMO Academy, Alibaba Group Hupan Laboratory College of Computer Science and Technology, Zhejiang University
Yujian Hu School of Medicine, Zhejiang University Department of Vascular Surgery, The First Affiliated Hospital of Zhejiang University School of Medicine
Xiyao Ma DAMO Academy, Alibaba Group Institute of Automation, Chinese Academy of Sciences
Zhouhang Yuan DAMO Academy, Alibaba Group School of Medicine, Zhejiang University College of Computer Science and Technology, Zhejiang University
Zirui Wang DAMO Academy, Alibaba Group Hupan Laboratory
Hongkun Zhang Department of Vascular Surgery, The First Affiliated Hospital of Zhejiang University School of Medicine
Minfeng Xu DAMO Academy, Alibaba Group Hupan Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i15.38305

Abstract

Missing critical modalities in medical imaging poses significant challenges for AI-driven diagnostic systems, particularly in scenarios where limited modalities must suffice for downstream tasks. Existing approaches often fail to fully leverage privileged features available only at training or address the information gap between privileged and limited modalities, resulting in suboptimal performance. To address this, we propose a unified, dual-stage Disentanglement-AligNmenT framEwork (DANTE), which uses InformationTheoretic Regularization and Cross-Modal Reconstruction to decompose full-modality information into alignable and privileged-exclusive components. In the first stage, a self-supervised pre-training strategy based on cross-modal reconstruction acts as a proxy task to implicitly incentivize disentangled representations. In the second stage, we present an information-theoretic regularization to explicitly maximize the transfer of privileged knowledge through two novel modules: (1) a Mutual Alignment Module that employs multilevel bidirectional alignment between limited-modality features and alignable features, enhancing cross-modal representation consistency; (2) a Privileged Compaction Module that restricts the privileged-exclusive information flow, promoting the integration of task-relevant content into alignable representations. Experimental results on three challenging medical datasets demonstrate that DANTE achieves state-of-the-art performance, demonstrating its effectiveness in leveraging privileged guidance under modality scarcity, and exhibits broad applicability across diverse medical imaging scenarios.

Disentangling for Transfer: Boosting Limited Modalities via Information-Theoretic Regularization and Cross-Modal Reconstruction

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information