Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Fanchao Lin; Hongtao Xie; Yan Li; Yongdong Zhang

doi:10.1609/aaai.v35i3.16300

Authors

Fanchao Lin School of Information Science and Technology, University of Science and Technology of China, Hefei, China
Hongtao Xie School of Information Science and Technology, University of Science and Technology of China, Hefei, China
Yan Li Beijing Kuaishou Technology Co., Ltd., Beijing, China School of Information Science and Technology, University of Science and Technology of China, Hefei, China
Yongdong Zhang School of Information Science and Technology, University of Science and Technology of China, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v35i3.16300

Keywords:

Motion & Tracking, Segmentation

Abstract

Weakly-supervised video object segmentation (WVOS) is an emerging video task that can track and segment the target given a simple bounding box label. However, existing WVOS methods are still unsatisfied in either speed or accuracy, since they only use the exemplar frame to guide the prediction while they neglect the reference from other frames. To solve the problem, we propose a novel Re-Aggregation based framework, which uses feature matching to efficiently find the target and capture the temporal dependencies from multiple frames to guide the segmentation. Based on a two-stage structure, our framework builds an information-symmetric matching process to achieve robust aggregation. In each stage, we design a Query-Memory Aggregation (QMA) module to gather features from the past frames and make bidirectional aggregation to adaptively weight the aggregated features, which relieves the latent misguidance in unidirectional aggregation. To further exploit the information from different aggregation stages, we propose a novel coarse-fine constraint by using the Cascaded Refinement Module (CRM) to combine the predictions from different stages and further boosts the performance. Experimental results on three benchmarks show that our method achieves the state-of-the-art performance in WVOS (e.g., an overall score of 84.7% on the DAVIS 2016 validation set).

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information