Segment Anything Across Shots: A Method and Benchmark
DOI:
https://doi.org/10.1609/aaai.v40i6.42485Abstract
This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. While existing VOS methods mainly focus on single-shot videos, they often fail to handle shot discontinuities, thereby limiting their real-world applicability. Furthermore, the lack of annotated multi-shot data poses a major challenge for MVOS research. To address these issues, we propose a transition mimicking data augmentation strategy (TMA) that enables cross-shot generalization using single-shot data, and a transition-aware method, Segment Anything Across Shots (SAAS), which detects and comprehends shot transitions during inference. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed SAAS achieves state-of-the-art performance by effectively mimicking, understanding, and segmenting across complex transitions.Published
2026-03-14
How to Cite
Hu, H., Ying, K., & Ding, H. (2026). Segment Anything Across Shots: A Method and Benchmark. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4825–4833. https://doi.org/10.1609/aaai.v40i6.42485
Issue
Section
AAAI Technical Track on Computer Vision III