Frame-Guided Region-Aligned Representation for Video Person Re-Identification

Authors

  • Zengqun Chen South China University of Technology
  • Zhiheng Zhou South China University of Technology
  • Junchu Huang South China University of Technology
  • Pengyu Zhang South China University of Technology
  • Bo Li South China University of Technology

DOI:

https://doi.org/10.1609/aaai.v34i07.6632

Abstract

Pedestrians in videos are usually in a moving state, resulting in serious spatial misalignment like scale variations and pose changes, which makes the video-based person re-identification problem more challenging. To address the above issue, in this paper, we propose a Frame-Guided Region-Aligned model (FGRA) for discriminative representation learning in two steps in an end-to-end manner. Firstly, based on a frame-guided feature learning strategy and a non-parametric alignment module, a novel alignment mechanism is proposed to extract well-aligned region features. Secondly, in order to form a sequence representation, an effective feature aggregation strategy that utilizes temporal alignment score and spatial attention is adopted to fuse region features in the temporal and spatial dimensions, respectively. Experiments are conducted on benchmark datasets to demonstrate the effectiveness of the proposed method to solve the misalignment problem and the superiority of the proposed method to the existing video-based person re-identification methods.

Downloads

Published

2020-04-03

How to Cite

Chen, Z., Zhou, Z., Huang, J., Zhang, P., & Li, B. (2020). Frame-Guided Region-Aligned Representation for Video Person Re-Identification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 10591-10598. https://doi.org/10.1609/aaai.v34i07.6632

Issue

Section

AAAI Technical Track: Vision