MaskViM: Domain Generalized Semantic Segmentation with State Space Models

Authors

  • Jiahao Li School of Informatics, Xiamen University
  • Yang Lu School of Informatics, Xiamen University Institute of Artificial Intelligence, Xiamen University Key Laboratory of Multimedia Trusted Perception and Efficient Computing,Ministry of Education of China, Xiamen University
  • Yuan Xie School of Computer Science and Technology, East China Normal University Chongqing Institute of East China Normal University
  • Yanyun Qu School of Informatics, Xiamen University Institute of Artificial Intelligence, Xiamen University Key Laboratory of Multimedia Trusted Perception and Efficient Computing,Ministry of Education of China, Xiamen University

DOI:

https://doi.org/10.1609/aaai.v39i5.32502

Abstract

Domain Generalized Semantic Segmentation (DGSS) aims to utilize segmentation model training on known source domains to make predictions on unknown target domains. Currently, there are two network architectures: one based on Convolutional Neural Networks (CNNs) and the other based on Visual Transformers (ViTs). However, both CNN-based and ViT-based DGSS methods face challenges: the former lacks a global receptive field, while the latter requires more computational demands. Drawing inspiration from State Space Models (SSMs), which not only possess a global receptive field but also maintain linear complexity, we propose SSM-based method for achieving DGSS. In this work, we first elucidate why does mask make sense in SSM-based DGSS and propose our mask learning mechanism. Leveraging this mechanism, we present our Mask Vision Mamba network (MaskViM), a model for SSM-based DGSS, and design our mask loss to optimize MaskViM. Our method achieves superior performance on four diverse DGSS setting, which demonstrates the effectiveness of our method.

Downloads

Published

2025-04-11

How to Cite

Li, J., Lu, Y., Xie, Y., & Qu, Y. (2025). MaskViM: Domain Generalized Semantic Segmentation with State Space Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 4752–4760. https://doi.org/10.1609/aaai.v39i5.32502

Issue

Section

AAAI Technical Track on Computer Vision IV