LUMIN: A Longitudinal Multi-modal Knowledge Decomposition Network for Predicting Breast Cancer Recurrence
DOI:
https://doi.org/10.1609/aaai.v40i9.37693Abstract
Accurate prediction of breast cancer recurrence after treatment is essential for improving long-term outcomes. However, existing models are limited by three key challenges: (1) they typically rely on single-modal data, missing cross-modal interactions; (2) they analyze static snapshots, failing to capture disease progression over time; and (3) they often perform coarse feature fusion, lacking semantic disentanglement and interpretability. To address these issues, we propose LUMIN (Longitudinal Multi-modal Knowledge Decomposition Network), a novel framework that integrates longitudinal mammograms and electronic health records (EHRs) for recurrence prediction. LUMIN leverages a vision-language contrastive pretraining backbone to align multi-modal representations and introduces two knowledge extraction modules: (1) a Cross-Modal Disentangled Knowledge Extractor (CM-DKE) that separates shared, complementary, and modality-specific information across imaging and text; and (2) a Temporal Evolution Disentangled Knowledge Extractor (TE-DKE) that captures time-invariant, time-varying, and time-specific features to model disease dynamics. Experiments on a large-scale dataset of 3,924 patients and 19,684 exams show that LUMIN significantly outperforms state-of-the-art baselines, demonstrating its effectiveness in capturing both multi-modal semantics and temporal heterogeneity for recurrence prediction.Published
2026-03-14
How to Cite
Lu, C., Zhang, T., Liang, X., Gao, Y., Han, L., Wang, X., … Mann, R. (2026). LUMIN: A Longitudinal Multi-modal Knowledge Decomposition Network for Predicting Breast Cancer Recurrence. Proceedings of the AAAI Conference on Artificial Intelligence, 40(9), 7530–7538. https://doi.org/10.1609/aaai.v40i9.37693
Issue
Section
AAAI Technical Track on Computer Vision VI