SpaCRD: Multimodal Deep Fusion of Histology and Spatial Transcriptomics for Cancer Region Detection

Authors

  • Shuailin Xue School of Information Science and Engineering, Yunnan University, Kunming 650500, China Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650500, China
  • Jun Wan School of Information Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China
  • Lihua Zhang School of Artificial Intelligence, School of Computer Science, Wuhan University, Wuhan 430072, China
  • Wenwen Min School of Information Science and Engineering, Yunnan University, Kunming 650500, China Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming 650500, China

DOI:

https://doi.org/10.1609/aaai.v40i14.38135

Abstract

Accurate detection of cancer tissue regions (CTR) enables deeper analysis of the tumor microenvironment and offers crucial insights into treatment response. Traditional CTR detection methods, which typically rely on the rich cellular morphology in histology images, are susceptible to a high rate of false positives due to morphological similarities across different tissue regions. The groundbreaking advances in spatial transcriptomics (ST) provide detailed cellular phenotypes and spatial localization information, offering new opportunities for more accurate cancer region detection. However, current methods are unable to effectively integrate histology images with ST data, especially in the context of cross-sample and cross-platform/batch settings for accomplishing the CTR detection. To address this challenge, we propose SpaCRD, a transfer learning-based method that deeply integrates histology images and ST data to enable reliable CTR detection across diverse samples, platforms, and batches. Once trained on source data, SpaCRD can be readily generalized to accurately detect cancerous regions across samples from different platforms and batches. The core of SpaCRD is a category-regularized variational reconstruction-guided bidirectional cross-attention fusion network, which enables the model to adaptively capture latent co-expression patterns between histological features and gene expression from multiple perspectives. Extensive benchmark analysis on 23 matched histology-ST datasets spanning various disease types, platforms, and batches demonstrates that SpaCRD consistently outperforms existing eight state-of-the-art methods in CTR detection.

Published

2026-03-14

How to Cite

Xue, S., Wan, J., Zhang, L., & Min, W. (2026). SpaCRD: Multimodal Deep Fusion of Histology and Spatial Transcriptomics for Cancer Region Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 11523-11531. https://doi.org/10.1609/aaai.v40i14.38135

Issue

Section

AAAI Technical Track on Computer Vision XI