Exposing the Self-Supervised Space-Time Correspondence Learning via Graph Kernels

Authors

  • Zheyun Qin Shandong university
  • Xiankai Lu Shandong University
  • Xiushan Nie Shandong Jianzhu University
  • Yilong Yin Shandong University
  • Jianbing Shen University of Macau

DOI:

https://doi.org/10.1609/aaai.v37i2.25304

Keywords:

CV: Video Understanding & Activity Analysis, CV: Representation Learning for Vision, CV: Scene Analysis & Understanding, CV: Segmentation

Abstract

Self-supervised space-time correspondence learning is emerging as a promising way of leveraging unlabeled video. Currently, most methods adapt contrastive learning with mining negative samples or reconstruction adapted from the image domain, which requires dense affinity across multiple frames or optical flow constraints. Moreover, video correspondence predictive models require mining more inherent properties in videos, such as structural information. In this work, we propose the VideoHiGraph, a space-time correspondence framework based on a learnable graph kernel. Concerning the video as the spatial-temporal graph, the learning objectives of VideoHiGraph are emanated in a self-supervised manner for predicting unobserved hidden graphs via graph kernel manner. We learn a representation of the temporal coherence across frames in which pairwise similarity defines the structured hidden graph, such that a biased random walk graph kernel along the sub-graph can predict long-range correspondence. Then, we learn a refined representation across frames on the node-level via a dense graph kernel. The self-supervision of the model training is formed by the structural and temporal consistency of the graph. VideoHiGraph achieves superior performance and demonstrates its robustness across the benchmark of label propagation tasks involving objects, semantic parts, keypoints, and instances. Our algorithm implementations have been made publicly available at https://github.com/zyqin19/VideoHiGraph.

Downloads

Published

2023-06-26

How to Cite

Qin, Z., Lu, X., Nie, X., Yin, Y., & Shen, J. (2023). Exposing the Self-Supervised Space-Time Correspondence Learning via Graph Kernels. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 2110-2118. https://doi.org/10.1609/aaai.v37i2.25304

Issue

Section

AAAI Technical Track on Computer Vision II