MaskViM: Domain Generalized Semantic Segmentation with State Space Models
DOI:
https://doi.org/10.1609/aaai.v39i5.32502Abstract
Domain Generalized Semantic Segmentation (DGSS) aims to utilize segmentation model training on known source domains to make predictions on unknown target domains. Currently, there are two network architectures: one based on Convolutional Neural Networks (CNNs) and the other based on Visual Transformers (ViTs). However, both CNN-based and ViT-based DGSS methods face challenges: the former lacks a global receptive field, while the latter requires more computational demands. Drawing inspiration from State Space Models (SSMs), which not only possess a global receptive field but also maintain linear complexity, we propose SSM-based method for achieving DGSS. In this work, we first elucidate why does mask make sense in SSM-based DGSS and propose our mask learning mechanism. Leveraging this mechanism, we present our Mask Vision Mamba network (MaskViM), a model for SSM-based DGSS, and design our mask loss to optimize MaskViM. Our method achieves superior performance on four diverse DGSS setting, which demonstrates the effectiveness of our method.Published
2025-04-11
How to Cite
Li, J., Lu, Y., Xie, Y., & Qu, Y. (2025). MaskViM: Domain Generalized Semantic Segmentation with State Space Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 4752–4760. https://doi.org/10.1609/aaai.v39i5.32502
Issue
Section
AAAI Technical Track on Computer Vision IV