Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos
DOI:
https://doi.org/10.1609/aaai.v37i3.25375Keywords:
CV: Video Understanding & Activity AnalysisAbstract
Learning with noisy label is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering temporal semantics and computational cost is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) a lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category. 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed truNcatE-split-contrAsT (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10% of it, our method achieves over 0.4 noise detection F1-score and 5% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6%.Downloads
Published
2023-06-26
How to Cite
Wang, Z., Weng, J., Yuan, C., & Wang, J. (2023). Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2751-2758. https://doi.org/10.1609/aaai.v37i3.25375
Issue
Section
AAAI Technical Track on Computer Vision III