Gait Transformer: End-to-End Transformer Backbone for Gait Recognition

Authors

  • Saihui Hou Beijing Normal University
  • Wenpeng Lang Beijing Normal University
  • Jilong Wang University of Science and Technology of China Institute of Automation, Chinese Academy of Sciences
  • Yan Huang Institute of Automation, Chinese Academy of Sciences
  • Liang Wang Institute of Automation, Chinese Academy of Sciences
  • Yongzhen Huang Beijing Normal University WATRIX.AI

DOI:

https://doi.org/10.1609/aaai.v40i6.42479

Abstract

Gait recognition has emerged as a promising biometric technique for long-distance and non-intrusive human identification. While Transformers have revolutionized vision tasks, their adaptation to gait recognition remains underexplored due to domain-specific challenges such as sparse silhouette modality, spatial-temporal dynamics, fine-grained motion cues, and limited training data. In this paper, we propose Gait Transformer (GaT), an end-to-end Transformer backbone specifically tailored for silhouette-based gait recognition. GaT introduces three key components: (1) a hybrid patch embedding module that combines convolutional stems with group-batch normalization to enhance structural preservation; (2) a decomposed token mixer that explicitly models both short-range and long-range dependencies across spatial-temporal dimensions; and (3) a hybrid positional encoding strategy that integrates absolute, relative, and rotary embeddings to support efficient training under data scarcity. Without relying on any pretraining, GaT achieves state-of-the-art performance on Gait3D, GREW, and CCGR-MINI.

Downloads

Published

2026-03-14

How to Cite

Hou, S., Lang, W., Wang, J., Huang, Y., Wang, L., & Huang, Y. (2026). Gait Transformer: End-to-End Transformer Backbone for Gait Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4771–4779. https://doi.org/10.1609/aaai.v40i6.42479

Issue

Section

AAAI Technical Track on Computer Vision III