SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation

Authors

  • Junjie Jiang Huawei Cloud
  • Zelin Wang Huawei Cloud
  • Manqi Zhao Huawei Cloud
  • Yin Li Huawei Cloud
  • Dongsheng Jiang Huawei Cloud

DOI:

https://doi.org/10.1609/aaai.v40i7.37455

Abstract

Inspired by Segment Anything 2, which generalizes segmentation from images to videos, we propose SAM2MOT—a novel segmentation-driven paradigm for multi-object tracking that breaks away from the conventional detection-association framework. In contrast to previous approaches that treat segmentation as auxiliary information, SAM2MOT places it at the heart of the tracking process, systematically tackling challenges like false positives and occlusions. Its effectiveness has been thoroughly validated on major MOT benchmarks. Furthermore, SAM2MOT integrates pre-trained detector, pre-trained segmentor with tracking logic into a zero-shot MOT system that requires no fine-tuning. This significantly reduces dependence on labeled data and paves the way for transitioning MOT research from task-specific solutions to general-purpose systems. Experiments on DanceTrack, UAVDT, and BDD100K show state-of-the-art results. Notably, SAM2MOT outperforms existing methods on DanceTrack by +2.1 HOTA and +4.5 IDF1, highlighting its effectiveness in MOT.

Downloads

Published

2026-03-14

How to Cite

Jiang, J., Wang, Z., Zhao, M., Li, Y., & Jiang, D. (2026). SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5388–5396. https://doi.org/10.1609/aaai.v40i7.37455

Issue

Section

AAAI Technical Track on Computer Vision IV