Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Xiaohan Lan; Yitian Yuan; Hong Chen; Xin Wang; Zequn Jie; Lin Ma; Zhi Wang; Wenwu Zhu

doi:10.1609/aaai.v37i1.25204

Authors

Xiaohan Lan Tsinghua University
Yitian Yuan Meituan Inc.
Hong Chen Tsinghua University
Xin Wang Tsinghua University
Zequn Jie Meituan Inc.
Lin Ma Meituan Inc.
Zhi Wang Tsinghua University
Wenwu Zhu Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v37i1.25204

Keywords:

CV: Video Understanding & Activity Analysis, CV: Language and Vision, CV: Multi-modal Vision

Abstract

Video Grounding (VG) aims to locate the desired segment from a video given a sentence query. Recent studies have found that current VG models are prone to over-rely the groundtruth moment annotation distribution biases in the training set. To discourage the standard VG model's behavior of exploiting such temporal annotation biases and improve the model generalization ability, we propose multiple negative augmentations in a hierarchical way, including cross-video augmentations from clip-/video-level, and self-shuffled augmentations with masks. These augmentations can effectively diversify the data distribution so that the model can make more reasonable predictions instead of merely fitting the temporal biases. However, directly adopting such data augmentation strategy may inevitably carry some noise shown in our cases, since not all of the handcrafted augmentations are semantically irrelevant to the groundtruth video. To further denoise and improve the grounding accuracy, we design a multi-stage curriculum strategy to adaptively train the standard VG model from easy to hard negative augmentations. Experiments on newly collected Charades-CD and ActivityNet-CD datasets demonstrate our proposed strategy can improve the performance of the base model on both i.i.d and o.o.d scenarios.

Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription