Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos

Zixiao Wang; Junwu Weng; Chun Yuan; Jue Wang

doi:10.1609/aaai.v37i3.25375

Authors

Zixiao Wang Tsinghua University
Junwu Weng Tencent AI Lab
Chun Yuan Tsinghua University
Jue Wang Tencent AI Lab

DOI:

https://doi.org/10.1609/aaai.v37i3.25375

Keywords:

CV: Video Understanding & Activity Analysis

Abstract

Learning with noisy label is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering temporal semantics and computational cost is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) a lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category. 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed truNcatE-split-contrAsT (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10% of it, our method achieves over 0.4 noise detection F1-score and 5% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6%.

Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription