Spatio-Temporal Difference Descriptor for Skeleton-Based Action Recognition

Authors

  • Chongyang Ding Xidian University
  • Kai Liu Xidian University
  • Jari Korhonen Shenzhen University
  • Evgeny Belyaev ITMO University

DOI:

https://doi.org/10.1609/aaai.v35i2.16210

Keywords:

Video Understanding & Activity Analysis

Abstract

In skeletal representation, intra-frame differences between body joints, as well as inter-frame dynamics between body skeletons contain discriminative information for action recognition. Conventional methods for modeling human skeleton sequences generally depend on motion trajectory and body joint dependency information, thus lacking the ability to identify the inherent differences of human skeletons. In this paper, we propose a spatio-temporal difference descriptor based on a directional convolution architecture that enables us to learn the spatio-temporal differences and contextual dependencies between different body joints simultaneously. The overall model is built on a deep symmetric positive definite (SPD) metric learning architecture designed to learn discriminative manifold features with the well-designed non-linear mapping operation. Experiments on several action datasets show that our proposed method achieves up to 3% accuracy improvement over state-of-the-art methods.

Downloads

Published

2021-05-18

How to Cite

Ding, C., Liu, K., Korhonen, J., & Belyaev, E. (2021). Spatio-Temporal Difference Descriptor for Skeleton-Based Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1227-1235. https://doi.org/10.1609/aaai.v35i2.16210

Issue

Section

AAAI Technical Track on Computer Vision I