Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Authors

  • Fanchao Lin School of Information Science and Technology, University of Science and Technology of China, Hefei, China
  • Hongtao Xie School of Information Science and Technology, University of Science and Technology of China, Hefei, China
  • Yan Li Beijing Kuaishou Technology Co., Ltd., Beijing, China School of Information Science and Technology, University of Science and Technology of China, Hefei, China
  • Yongdong Zhang School of Information Science and Technology, University of Science and Technology of China, Hefei, China

DOI:

https://doi.org/10.1609/aaai.v35i3.16300

Keywords:

Motion & Tracking, Segmentation

Abstract

Weakly-supervised video object segmentation (WVOS) is an emerging video task that can track and segment the target given a simple bounding box label. However, existing WVOS methods are still unsatisfied in either speed or accuracy, since they only use the exemplar frame to guide the prediction while they neglect the reference from other frames. To solve the problem, we propose a novel Re-Aggregation based framework, which uses feature matching to efficiently find the target and capture the temporal dependencies from multiple frames to guide the segmentation. Based on a two-stage structure, our framework builds an information-symmetric matching process to achieve robust aggregation. In each stage, we design a Query-Memory Aggregation (QMA) module to gather features from the past frames and make bidirectional aggregation to adaptively weight the aggregated features, which relieves the latent misguidance in unidirectional aggregation. To further exploit the information from different aggregation stages, we propose a novel coarse-fine constraint by using the Cascaded Refinement Module (CRM) to combine the predictions from different stages and further boosts the performance. Experimental results on three benchmarks show that our method achieves the state-of-the-art performance in WVOS (e.g., an overall score of 84.7% on the DAVIS 2016 validation set).

Downloads

Published

2021-05-18

How to Cite

Lin, F., Xie, H., Li, Y., & Zhang, Y. (2021). Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 2038-2046. https://doi.org/10.1609/aaai.v35i3.16300

Issue

Section

AAAI Technical Track on Computer Vision II