Iterative Contrast-Classify for Semi-supervised Temporal Action Segmentation

Dipika Singhania; Rahul Rahaman; Angela Yao

doi:10.1609/aaai.v36i2.20124

Authors

Dipika Singhania National University of Singapore
Rahul Rahaman National University of Singapore
Angela Yao National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v36i2.20124

Keywords:

Computer Vision (CV)

Abstract

Temporal action segmentation classifies the action of each frame in (long) video sequences. Due to the high cost of frame-wise labeling, we propose the first semi-supervised method for temporal action segmentation. Our method hinges on unsupervised representation learning, which, for temporal action segmentation, poses unique challenges. Actions in untrimmed videos vary in length and have unknown labels and start/end times. Ordering of actions across videos may also vary. We propose a novel way to learn frame-wise representations from temporal convolutional networks (TCNs) by clustering input features with added time-proximity conditions and multi-resolution similarity. By merging representation learning with conventional supervised learning, we develop an "Iterative Contrast-Classify (ICC)'' semi-supervised learning scheme. With more labelled data, ICC progressively improves in performance; ICC semi-supervised learning, with 40% labelled videos, performs similarly to fully-supervised counterparts. Our ICC improves MoF by {+1.8, +5.6, +2.5}% on Breakfast, 50Salads, and GTEA respectively for 100% labelled videos.

Iterative Contrast-Classify for Semi-supervised Temporal Action Segmentation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription