Semi-supervised Active Learning for Video Action Detection

Authors

  • Ayush Singh IIT(ISM),Dhanbad
  • Aayush J Rana University of Central Florida
  • Akash Kumar University of Central Florida
  • Shruti Vyas University of Central Florida
  • Yogesh Singh Rawat University of Central Florida

DOI:

https://doi.org/10.1609/aaai.v38i5.28292

Keywords:

CV: Video Understanding & Activity Analysis, CV: Scene Analysis & Understanding

Abstract

In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as un- labeled data along with informative sample selection for ac- tion detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning (informative sample se- lection) as well as semi-supervised learning (pseudo label generation). First, we propose NoiseAug, a simple augmenta- tion strategy which effectively selects informative samples for video action detection. Next, we propose fft-attention, a novel technique based on high-pass filtering which enables effective utilization of pseudo label for SSL in video action detection by emphasizing on relevant activity region within a video. We evaluate the proposed approach on three different bench- mark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness on video action detec- tion where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB- 21. Next, we also show its effectiveness on Youtube-VOS for video object segmentation demonstrating its generalization capability for other dense prediction tasks in videos.

Published

2024-03-24

How to Cite

Singh, A., Rana, A. J., Kumar, A., Vyas, S., & Rawat, Y. S. (2024). Semi-supervised Active Learning for Video Action Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4891-4899. https://doi.org/10.1609/aaai.v38i5.28292

Issue

Section

AAAI Technical Track on Computer Vision IV