GaitCycFormer: Leveraging Gait Cycles and Transformers for Gait Emotion Recognition.

Authors

  • Qingyang Zeng State Key Laboratory for Novel Software Technology, Nanjing University Department of Computer Science and Technology, Nanjing University
  • Lin Shang State Key Laboratory for Novel Software Technology, Nanjing University Department of Computer Science and Technology, Nanjing University

DOI:

https://doi.org/10.1609/aaai.v39i9.33064

Abstract

Gait Emotion Recognition (GER) is an emerging task within Human Emotion Recognition. Skeleton-based GER requires discriminative spatial and temporal features. However, current methods primarily focus on capturing spatial topology information but fail to effectively learn temporal features from long-distance frames. Moreover, these methods are mostly sensitive to the order of sampled sequences, resulting in significant accuracy drops when sequences are randomly sampled. In order to obtain a more robust and comprehensive spatial-temporal representation of gait, we introduce the Graph-Transformer architecture into GER for the first time, proposing a novel framework named GaitCycFormer. Specifically, we designed a Cycle Position Encoding (CPE) based on the gait cycle, which explicitly segments any gait sequence into more manageable periodic units, to enhance temporal feature modeling. Additionally, we incorporate a bi-level Transformer, consisting of an Intra-cycle Transformer and an Inter-cycle Transformer to capture local and global temporal information within each gait cycle and between gait cycles respectively. Experiments demonstrate that our GaitCycFormer achieves state-of-the-art performance on popular datasets, and proves to be more reliable and robust.

Downloads

Published

2025-04-11

How to Cite

Zeng, Q., & Shang, L. (2025). GaitCycFormer: Leveraging Gait Cycles and Transformers for Gait Emotion Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 39(9), 9815–9823. https://doi.org/10.1609/aaai.v39i9.33064

Issue

Section

AAAI Technical Track on Computer Vision VIII