Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations
DOI:
https://doi.org/10.1609/aaai.v37i3.25451Keywords:
CV: Video Understanding & Activity Analysis, CV: Representation Learning for Vision, ML: Unsupervised & Self-Supervised LearningAbstract
Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons’ structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD. Our project is publicly available at: https://jhang2020.github.io/Projects/HiCLR/HiCLR.html.Downloads
Published
2023-06-26
How to Cite
Zhang, J., Lin, L., & Liu, J. (2023). Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3427-3435. https://doi.org/10.1609/aaai.v37i3.25451
Issue
Section
AAAI Technical Track on Computer Vision III