Exposing the Self-Supervised Space-Time Correspondence Learning via Graph Kernels

Zheyun Qin; Xiankai Lu; Xiushan Nie; Yilong Yin; Jianbing Shen

doi:10.1609/aaai.v37i2.25304

Authors

Zheyun Qin Shandong university
Xiankai Lu Shandong University
Xiushan Nie Shandong Jianzhu University
Yilong Yin Shandong University
Jianbing Shen University of Macau

DOI:

https://doi.org/10.1609/aaai.v37i2.25304

Keywords:

CV: Video Understanding & Activity Analysis, CV: Representation Learning for Vision, CV: Scene Analysis & Understanding, CV: Segmentation

Abstract

Self-supervised space-time correspondence learning is emerging as a promising way of leveraging unlabeled video. Currently, most methods adapt contrastive learning with mining negative samples or reconstruction adapted from the image domain, which requires dense affinity across multiple frames or optical flow constraints. Moreover, video correspondence predictive models require mining more inherent properties in videos, such as structural information. In this work, we propose the VideoHiGraph, a space-time correspondence framework based on a learnable graph kernel. Concerning the video as the spatial-temporal graph, the learning objectives of VideoHiGraph are emanated in a self-supervised manner for predicting unobserved hidden graphs via graph kernel manner. We learn a representation of the temporal coherence across frames in which pairwise similarity defines the structured hidden graph, such that a biased random walk graph kernel along the sub-graph can predict long-range correspondence. Then, we learn a refined representation across frames on the node-level via a dense graph kernel. The self-supervision of the model training is formed by the structural and temporal consistency of the graph. VideoHiGraph achieves superior performance and demonstrates its robustness across the benchmark of label propagation tasks involving objects, semantic parts, keypoints, and instances. Our algorithm implementations have been made publicly available at https://github.com/zyqin19/VideoHiGraph.

Exposing the Self-Supervised Space-Time Correspondence Learning via Graph Kernels

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information