Actionness Inconsistency-Guided Contrastive Learning for Weakly-Supervised Temporal Action Localization

Authors

  • Zhilin Li University of Science and Technology of China
  • Zilei Wang University of Science and Technology of China
  • Qinying Liu University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v37i2.25237

Keywords:

CV: Video Understanding & Activity Analysis, ML: Representation Learning

Abstract

Weakly-supervised temporal action localization (WTAL) aims to detect action instances given only video-level labels. To address the challenge, recent methods commonly employ a two-branch framework, consisting of a class-aware branch and a class-agnostic branch. In principle, the two branches are supposed to produce the same actionness activation. However, we observe that there are actually many inconsistent activation regions. These inconsistent regions usually contain some challenging segments whose semantic information (action or background) is ambiguous. In this work, we propose a novel Actionness Inconsistency-guided Contrastive Learning (AICL) method which utilizes the consistent segments to boost the representation learning of the inconsistent segments. Specifically, we first define the consistent and inconsistent segments by comparing the predictions of two branches and then construct positive and negative pairs between consistent segments and inconsistent segments for contrastive learning. In addition, to avoid the trivial case where there is no consistent sample, we introduce an action consistency constraint to control the difference between the two branches. We conduct extensive experiments on THUMOS14, ActivityNet v1.2, and ActivityNet v1.3 datasets, and the results show the effectiveness of AICL with state-of-the-art performance. Our code is available at https://github.com/lizhilin-ustc/AAAI2023-AICL.

Downloads

Published

2023-06-26

How to Cite

Li, Z., Wang, Z., & Liu, Q. (2023). Actionness Inconsistency-Guided Contrastive Learning for Weakly-Supervised Temporal Action Localization. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 1513-1521. https://doi.org/10.1609/aaai.v37i2.25237

Issue

Section

AAAI Technical Track on Computer Vision II