Momentum Pseudo-Labeling for Weakly Supervised Phrase Grounding

Dongdong Kuang; Richong Zhang; Zhijie Nie; Junfan Chen; Jaein Kim

doi:10.1609/aaai.v39i23.34612

Authors

Dongdong Kuang CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China
Richong Zhang CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China Zhongguancun Laboratory, Beijing, China
Zhijie Nie CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China Shen Yuan Honors College, Beihang University, Beijing, China
Junfan Chen CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China School of Software, Beihang University, Beijing, China
Jaein Kim CCSE, School of Computer Science and Engineering, Beihang University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i23.34612

Abstract

Weakly supervised phrase grounding tasks aim to learn alignments between phrases and regions with coarse image-caption match information. One branch of previous methods established pseudo-label relationships between phrases and regions based on the Expectation-Maximization (EM) algorithm combined with contrastive learning. However, adopting a simplified batch-level local update (partial) of pseudo-labels in E-step is sub-optimal, while extending it to global update requires inefficiently numerous computations. In addition, their failure to consider potential false negative examples in contrastive loss negatively impacts the effectiveness of M-step optimization. To address these issues, we propose a Momentum Pseudo Labeling (MPL) method, which efficiently uses a momentum model to synchronize global pseudo-label updates on the fly with model parameter updating. Additionally, we explore potential relationships between phrases and regions from non-matching image-caption pairs and convert these false negative examples to positive ones in contrastive learning. Our approach achieved SOTA performance on 3 commonly used grounding datasets for weakly supervised phrase grounding tasks.

Momentum Pseudo-Labeling for Weakly Supervised Phrase Grounding

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information