Contrastive Transformation for Self-supervised Correspondence Learning

Authors

  • Ning Wang CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China
  • Wengang Zhou CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
  • Houqiang Li CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

DOI:

https://doi.org/10.1609/aaai.v35i11.17220

Keywords:

Unsupervised & Self-Supervised Learning

Abstract

In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled videos in the wild. Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation. The intra-video learning transforms the image contents across frames within a single video via the frame pair-wise affinity. To obtain the discriminative representation for instance-level separation, we go beyond the intra-video analysis and construct the inter-video affinity to facilitate the contrastive transformation across different videos. By forcing the transformation consistency between intra- and inter-video levels, the fine-grained correspondence associations are well preserved and the instance-level feature discrimination is effectively reinforced. Our simple framework outperforms the recent self-supervised correspondence methods on a range of visual tasks including video object tracking (VOT), video object segmentation (VOS), pose keypoint tracking, etc. It is worth mentioning that our method also surpasses the fully-supervised affinity representation (e.g., ResNet) and performs competitively against the recent fully-supervised algorithms designed for the specific tasks (e.g., VOT and VOS).

Downloads

Published

2021-05-18

How to Cite

Wang, N., Zhou, W., & Li, H. (2021). Contrastive Transformation for Self-supervised Correspondence Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11), 10174-10182. https://doi.org/10.1609/aaai.v35i11.17220

Issue

Section

AAAI Technical Track on Machine Learning IV