TMFormer: Token Merging Transformer for Brain Tumor Segmentation with Missing Modalities

Authors

  • Zheyu Zhang University of Science and Technology of China, Hefei 230026, China
  • Gang Yang University of Science and Technology of China, Hefei 230026, China
  • Yueyi Zhang University of Science and Technology of China, Hefei 230026, China Hefei Comprehensive National Science Center, Institute of Artificial Intelligence, Hefei 230088, China
  • Huanjing Yue Tianjin University, Tianjin 300072, China
  • Aiping Liu University of Science and Technology of China, Hefei 230026, China
  • Yunwei Ou Beijing Tiantan Hospital, Capital Medical University, Beijing 100050, China Hefei Comprehensive National Science Center, Institute of Artificial Intelligence, Hefei 230088, China
  • Jian Gong Beijing Tiantan Hospital, Capital Medical University, Beijing 100050, China Hefei Comprehensive National Science Center, Institute of Artificial Intelligence, Hefei 230088, China
  • Xiaoyan Sun University of Science and Technology of China, Hefei 230026, China Hefei Comprehensive National Science Center, Institute of Artificial Intelligence, Hefei 230088, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28572

Keywords:

CV: Medical and Biological Imaging, CV: Segmentation

Abstract

Numerous techniques excel in brain tumor segmentation using multi-modal magnetic resonance imaging (MRI) sequences, delivering exceptional results. However, the prevalent absence of modalities in clinical scenarios hampers performance. Current approaches frequently resort to zero maps as substitutes for missing modalities, inadvertently introducing feature bias and redundant computations. To address these issues, we present the Token Merging transFormer (TMFormer) for robust brain tumor segmentation with missing modalities. TMFormer tackles these challenges by extracting and merging accessible modalities into more compact token sequences. The architecture comprises two core components: the Uni-modal Token Merging Block (UMB) and the Multi-modal Token Merging Block (MMB). The UMB enhances individual modality representation by adaptively consolidating spatially redundant tokens within and outside tumor-related regions, thereby refining token sequences for augmented representational capacity. Meanwhile, the MMB mitigates multi-modal feature fusion bias, exclusively leveraging tokens from present modalities and merging them into a unified multi-modal representation to accommodate varying modality combinations. Extensive experimental results on the BraTS 2018 and 2020 datasets demonstrate the superiority and efficacy of TMFormer compared to state-of-the-art methods when dealing with missing modalities.

Published

2024-03-24

How to Cite

Zhang, Z., Yang, G., Zhang, Y., Yue, H., Liu, A., Ou, Y., Gong, J., & Sun, X. (2024). TMFormer: Token Merging Transformer for Brain Tumor Segmentation with Missing Modalities. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7414-7422. https://doi.org/10.1609/aaai.v38i7.28572

Issue

Section

AAAI Technical Track on Computer Vision VI