USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation

Authors

  • Wanjiang Weng School of Computer Science and Engineering, Southeast University, Nanjing 210096, China Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
  • Hongsong Wang School of Computer Science and Engineering, Southeast University, Nanjing 210096, China Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
  • Junbo Wang School of Software, Northwestern Polytechnical University, Xi'an 710072, China
  • Lei He School of Vehicle and Mobility, Tsinghua University, Beijing, China
  • Guo-Sen Xie School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China

DOI:

https://doi.org/10.1609/aaai.v39i8.32899

Abstract

Contrastive learning has achieved great success in skeleton-based representation learning recently. However, the prevailing methods are predominantly negative-based, necessitating additional momentum encoder and memory bank to get negative samples, which increases the difficulty of model training. Furthermore, these methods primarily concentrate on learning a global representation for recognition and retrieval tasks, while overlooking the rich and detailed local representations that are crucial for dense prediction tasks. To alleviate these issues, we introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation, called USDRL, which employs feature decorrelation across temporal, spatial, and instance domains in a multi-grained manner to reduce redundancy among dimensions of the representations to maximize information extraction from features. Additionally, we design a Dense Spatio-Temporal Encoder (DSTE) to capture fine-grained action representations effectively, thereby enhancing the performance of dense prediction tasks. Comprehensive experiments, conducted on the benchmarks NTU-60, NTU-120, PKU-MMD I, and PKU-MMD II, across diverse downstream tasks including action recognition, action retrieval, and action detection, conclusively demonstrate that our approach significantly outperforms the current state-of-the-art (SOTA) approaches.

Downloads

Published

2025-04-11

How to Cite

Weng, W., Wang, H., Wang, J., He, L., & Xie, G.-S. (2025). USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 8332–8340. https://doi.org/10.1609/aaai.v39i8.32899

Issue

Section

AAAI Technical Track on Computer Vision VII