Segment Anything Across Shots: A Method and Benchmark

Hengrui Hu; Kaining Ying; Henghui Ding

doi:10.1609/aaai.v40i6.42485

Authors

Hengrui Hu Institute of Big Data, College of Computer Science and Artificial Intelligence, Fudan University, China
Kaining Ying Institute of Big Data, College of Computer Science and Artificial Intelligence, Fudan University, China
Henghui Ding Institute of Big Data, College of Computer Science and Artificial Intelligence, Fudan University, China

DOI:

https://doi.org/10.1609/aaai.v40i6.42485

Abstract

This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. While existing VOS methods mainly focus on single-shot videos, they often fail to handle shot discontinuities, thereby limiting their real-world applicability. Furthermore, the lack of annotated multi-shot data poses a major challenge for MVOS research. To address these issues, we propose a transition mimicking data augmentation strategy (TMA) that enables cross-shot generalization using single-shot data, and a transition-aware method, Segment Anything Across Shots (SAAS), which detects and comprehends shot transitions during inference. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed SAAS achieves state-of-the-art performance by effectively mimicking, understanding, and segmenting across complex transitions.

Segment Anything Across Shots: A Method and Benchmark

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information