Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations

Authors

  • Jiahang Zhang Wangxuan Institute of Computer Technology, Peking University
  • Lilang Lin Wangxuan Institute of Computer Technology, Peking University
  • Jiaying Liu Wangxuan Institute of Computer Technology, Peking University

DOI:

https://doi.org/10.1609/aaai.v37i3.25451

Keywords:

CV: Video Understanding & Activity Analysis, CV: Representation Learning for Vision, ML: Unsupervised & Self-Supervised Learning

Abstract

Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons’ structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD. Our project is publicly available at: https://jhang2020.github.io/Projects/HiCLR/HiCLR.html.

Downloads

Published

2023-06-26

How to Cite

Zhang, J., Lin, L., & Liu, J. (2023). Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3427-3435. https://doi.org/10.1609/aaai.v37i3.25451

Issue

Section

AAAI Technical Track on Computer Vision III