Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition

Authors

  • Jianyang Xie CDT in Distributed Algorithms, School of EEE&CS, University of Liverpool, UK Department of Eye and Vision Sciences, University of Liverpool, Liverpool, UK
  • Yanda Meng Department of Eye and Vision Sciences, University of Liverpool, Liverpool, UK Liverpool Centre for Cardiovascular Science, Liverpool, UK
  • Yitian Zhao Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, CAS, Cixi, China
  • Anh Nguyen Department of Computer Sciences, University of Liverpool, Liverpool, UK
  • Xiaoyun Yang Remark AI UK Limited, London, UK
  • Yalin Zheng Department of Eye and Vision Sciences, University of Liverpool, Liverpool, UK Liverpool Centre for Cardiovascular Science, Liverpool, UK

DOI:

https://doi.org/10.1609/aaai.v38i6.28440

Keywords:

CV: Video Understanding & Activity Analysis, DMKM: Mining of Spatial, Temporal or Spatio-Temporal Data, HAI: Applications, ML: Deep Neural Architectures and Foundation Models, ML: Graph-based Machine Learning

Abstract

Graph convolutional networks (GCNs) have attracted great attention and achieved remarkable performance in skeleton-based action recognition. However, most of the previous works are designed to refine skeleton topology without considering the types of different joints and edges, making them infeasible to represent the semantic information. In this paper, we proposed a dynamic semantic-based graph convolution network (DS-GCN) for skeleton-based human action recognition, where the joints and edge types were encoded in the skeleton topology in an implicit way. Specifically, two semantic modules, the joints type-aware adaptive topology and the edge type-aware adaptive topology, were proposed. Combining proposed semantics modules with temporal convolution, a powerful framework named DS-GCN was developed for skeleton-based action recognition. Extensive experiments in two datasets, NTU-RGB+D and Kinetics-400 show that the proposed semantic modules were generalized enough to be utilized in various backbones for boosting recognition accuracy. Meanwhile, the proposed DS-GCN notably outperformed state-of-the-art methods. The code is released here https://github.com/davelailai/DS-GCN

Published

2024-03-24

How to Cite

Xie, J., Meng, Y., Zhao, Y., Nguyen, A., Yang, X., & Zheng, Y. (2024). Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6225-6233. https://doi.org/10.1609/aaai.v38i6.28440

Issue

Section

AAAI Technical Track on Computer Vision V