2D-CrossScan Mamba: Enhancing State Space Models with Spatially Consistent Multi-Path 2D Information Propagation

Authors

  • Longlong Yu School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China Zhuoxi Lab, Hangzhou, China
  • Wenxi Li KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
  • Yaoqi Sun Lishui University, Lishui, China
  • Hang Xu School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China
  • Chenggang Yan School of Communication Engineering, Hangzhou Dianzi University, Hangzhou, China
  • Yuchen Guo Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i21.38855

Abstract

Despite recent progress in adapting State Space Models such as Mamba to vision tasks, their intrinsic 1D scanning mechanism imposes limitations when applied to inherently 2D-structured data like images. Existing adaptations, including VMamba and 2DMamba, either suffer from inconsistency between scanning order and spatial locality or restrict inter-patch communication to singular paths, hindering effective information propagation. In this paper, we propose 2D-CrossScan, a novel 2D-compatible scan framework that enables spatially consistent, multi-path hidden state propagation by integrating modified state equations over two-dimensional neighborhoods. Furthermore, we mitigate redundant information accumulation due to overlapping paths via cross-directional subtraction. To fully align with the 2D spatial structure, we introduce a multi-directional scanning strategy that starts simultaneously from all four corners of the image, enabling diverse propagation paths and better feature integration. Our approach maintains efficiency, requiring only minimal architectural changes to existing Mamba variants. Experimental results demonstrate substantial improvements in multiple visual tasks, including object detection and semantic segmentation on PANDA and COCO datasets. Compared to baseline SSM-based methods, 2D-CrossScan consistently yields better spatial representations, as confirmed by extensive effective receptive field visualizations and attention analyses. These results highlight the importance of geometry-aware state propagation and validate 2D-CrossScan as a simple yet powerful extension to SSMs for vision.

Downloads

Published

2026-03-14

How to Cite

Yu, L., Li, W., Sun, Y., Xu, H., Yan, C., & Guo, Y. (2026). 2D-CrossScan Mamba: Enhancing State Space Models with Spatially Consistent Multi-Path 2D Information Propagation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(21), 17957–17965. https://doi.org/10.1609/aaai.v40i21.38855

Issue

Section

AAAI Technical Track on Humans and AI