Self-supervised Multiplex Consensus Mamba for General Image Fusion

Authors

  • Yingying Wang Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
  • Rongjin Zhuang Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
  • Hui Zheng Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
  • Xuanhua He The Hong Kong University of Science and Technology
  • Ke Cao University of Science and Technology of China
  • Xiaotong Tu Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
  • Xinghao Ding Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China

DOI:

https://doi.org/10.1609/aaai.v40i22.38932

Abstract

Image fusion integrates complementary information from different modalities to generate high-quality fused images, thereby enhancing downstream tasks such as object detection and semantic segmentation. Unlike task-specific techniques that primarily focus on consolidating inter-modal information, general image fusion needs to address a wide range of tasks while improving performance without increasing complexity. To achieve this, we propose SMC-Mamba, a Self-supervised Multiplex Consensus Mamba framework for general image fusion. Specifically, the Modality-Agnostic Feature Enhancement (MAFE) module preserves fine details through adaptive gating and enhances global representations via spatial-channel and frequency rotational scanning. The Multiplex Consensus Cross-modal Mamba (MCCM) module enables dynamic collaboration among experts, reaching a consensus to efficiently integrate complementary information from multiple modalities. The cross-modal scanning within MCCM further strengthens feature interactions across modalities, facilitating seamless integration of critical information from both sources. Additionally, we introduce a Bi-level Self-supervised Contrastive Learning Loss (BSCL), which preserves high-frequency information without increasing computational overhead while simultaneously boosting performance in downstream tasks. Extensive experiments demonstrate that our approach outperforms state-of-the-art (SOTA) image fusion algorithms in tasks such as infrared-visible, medical, multi-focus, and multi-exposure fusion, as well as downstream visual tasks.

Published

2026-03-14

How to Cite

Wang, Y., Zhuang, R., Zheng, H., He, X., Cao, K., Tu, X., & Ding, X. (2026). Self-supervised Multiplex Consensus Mamba for General Image Fusion. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18647–18655. https://doi.org/10.1609/aaai.v40i22.38932

Issue

Section

AAAI Technical Track on Intelligent Robotics