Gait Transformer: End-to-End Transformer Backbone for Gait Recognition

Saihui Hou; Wenpeng Lang; Jilong Wang; Yan Huang; Liang Wang; Yongzhen Huang

doi:10.1609/aaai.v40i6.42479

Authors

Saihui Hou Beijing Normal University
Wenpeng Lang Beijing Normal University
Jilong Wang University of Science and Technology of China Institute of Automation, Chinese Academy of Sciences
Yan Huang Institute of Automation, Chinese Academy of Sciences
Liang Wang Institute of Automation, Chinese Academy of Sciences
Yongzhen Huang Beijing Normal University WATRIX.AI

DOI:

https://doi.org/10.1609/aaai.v40i6.42479

Abstract

Gait recognition has emerged as a promising biometric technique for long-distance and non-intrusive human identification. While Transformers have revolutionized vision tasks, their adaptation to gait recognition remains underexplored due to domain-specific challenges such as sparse silhouette modality, spatial-temporal dynamics, fine-grained motion cues, and limited training data. In this paper, we propose Gait Transformer (GaT), an end-to-end Transformer backbone specifically tailored for silhouette-based gait recognition. GaT introduces three key components: (1) a hybrid patch embedding module that combines convolutional stems with group-batch normalization to enhance structural preservation; (2) a decomposed token mixer that explicitly models both short-range and long-range dependencies across spatial-temporal dimensions; and (3) a hybrid positional encoding strategy that integrates absolute, relative, and rotary embeddings to support efficient training under data scarcity. Without relying on any pretraining, GaT achieves state-of-the-art performance on Gait3D, GREW, and CCGR-MINI.

Gait Transformer: End-to-End Transformer Backbone for Gait Recognition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information