Selective and Orthogonal Feature Activation for Pedestrian Attribute Recognition

Junyi Wu; Yan Huang; Min Gao; Yuzhen Niu; Mingjing Yang; Zhipeng Gao; Jianqiang Zhao

doi:10.1609/aaai.v38i6.28419

Authors

Junyi Wu AI Research Center, Xiamen Meiya Pico Information Company Ltd., Xiamen, China Xiamen Meiya Pico Information Security Research Institute Company Ltd., Xiamen, China College of Computer and Data Science, Fuzhou University, Fuzhou, China
Yan Huang Institute of Automation, Chinese Academy of Sciences, Beijing China
Min Gao College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
Yuzhen Niu College of Computer and Data Science, Fuzhou University, Fuzhou, China
Mingjing Yang College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
Zhipeng Gao AI Research Center, Xiamen Meiya Pico Information Company Ltd., Xiamen, China Xiamen Meiya Pico Information Security Research Institute Company Ltd., Xiamen, China
Jianqiang Zhao AI Research Center, Xiamen Meiya Pico Information Company Ltd., Xiamen, China Xiamen Meiya Pico Information Security Research Institute Company Ltd., Xiamen, China

DOI:

https://doi.org/10.1609/aaai.v38i6.28419

Keywords:

CV: Image and Video Retrieval, CV: Representation Learning for Vision

Abstract

Pedestrian Attribute Recognition (PAR) involves identifying the attributes of individuals in person images. Existing PAR methods typically rely on CNNs as the backbone network to extract pedestrian features. However, CNNs process only one adjacent region at a time, leading to the loss of long-range inter-relations between different attribute-specific regions. To address this limitation, we leverage the Vision Transformer (ViT) instead of CNNs as the backbone for PAR, aiming to model long-range relations and extract more robust features. However, PAR suffers from an inherent attribute imbalance issue, causing ViT to naturally focus more on attributes that appear frequently in the training set and ignore some pedestrian attributes that appear less. The native features extracted by ViT are not able to tolerate the imbalance attribute distribution issue. To tackle this issue, we propose two novel components: the Selective Feature Activation Method (SFAM) and the Orthogonal Feature Activation Loss. SFAM smartly suppresses the more informative attribute-specific features, compelling the PAR model to capture discriminative features from regions that are easily overlooked. The proposed loss enforces an orthogonal constraint on the original feature extracted by ViT and the suppressed features from SFAM, promoting the complementarity of features in space. We conduct experiments on several benchmark PAR datasets, including PETA, PA100K, RAPv1, and RAPv2, demonstrating the effectiveness of our method. Specifically, our method outperforms existing state-of-the-art approaches by GRL, IAA-Caps, ALM, and SSC in terms of mA on the four datasets, respectively.

Selective and Orthogonal Feature Activation for Pedestrian Attribute Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information