Multi-Granularity Video Object Segmentation

Authors

  • Sangbeom Lim Korea University
  • Seongchan Kim Korea University
  • Seungjun An Samsung Electronics
  • Seokju Cho Korea Advanced Institute of Science & Technology
  • Paul Hongsuck Seo Korea University
  • Seungryong Kim Korea Advanced Institute of Science & Technology

DOI:

https://doi.org/10.1609/aaai.v39i5.32552

Abstract

Current benchmarks for video segmentation are limited to annotating only salient objects (i.e., foreground instances). Despite their impressive architectural designs, previous works trained on these benchmarks have struggled to adapt to realworld scenarios. Thus, developing a new video segmentation dataset aimed at tracking multi-granularity segmentation target in the video scene is necessary. In this work, we aim to generate multi-granularity video segmentation dataset that is annotated for both salient and non-salient masks. To achieve this, we propose a large-scale, densely annotated multi-granularity video object segmentation (MUG-VOS) dataset that includes various types and granularities of mask annotations. We automatically collected a training set that assists in tracking both salient and non-salient objects, and we also curated a human-annotated test set for reliable evaluation. In addition, we present memory-based mask propagation model (MMPM), trained and evaluated on MUG-VOS dataset, which leads to the best performance among the existing video object segmentation methods and Segment SAM-based video segmentation methods.

Downloads

Published

2025-04-11

How to Cite

Lim, S., Kim, S., An, S., Cho, S., Seo, P. H., & Kim, S. (2025). Multi-Granularity Video Object Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 5200–5208. https://doi.org/10.1609/aaai.v39i5.32552

Issue

Section

AAAI Technical Track on Computer Vision IV