Selective and Orthogonal Feature Activation for Pedestrian Attribute Recognition

Authors

  • Junyi Wu AI Research Center, Xiamen Meiya Pico Information Company Ltd., Xiamen, China Xiamen Meiya Pico Information Security Research Institute Company Ltd., Xiamen, China College of Computer and Data Science, Fuzhou University, Fuzhou, China
  • Yan Huang Institute of Automation, Chinese Academy of Sciences, Beijing China
  • Min Gao College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
  • Yuzhen Niu College of Computer and Data Science, Fuzhou University, Fuzhou, China
  • Mingjing Yang College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
  • Zhipeng Gao AI Research Center, Xiamen Meiya Pico Information Company Ltd., Xiamen, China Xiamen Meiya Pico Information Security Research Institute Company Ltd., Xiamen, China
  • Jianqiang Zhao AI Research Center, Xiamen Meiya Pico Information Company Ltd., Xiamen, China Xiamen Meiya Pico Information Security Research Institute Company Ltd., Xiamen, China

DOI:

https://doi.org/10.1609/aaai.v38i6.28419

Keywords:

CV: Image and Video Retrieval, CV: Representation Learning for Vision

Abstract

Pedestrian Attribute Recognition (PAR) involves identifying the attributes of individuals in person images. Existing PAR methods typically rely on CNNs as the backbone network to extract pedestrian features. However, CNNs process only one adjacent region at a time, leading to the loss of long-range inter-relations between different attribute-specific regions. To address this limitation, we leverage the Vision Transformer (ViT) instead of CNNs as the backbone for PAR, aiming to model long-range relations and extract more robust features. However, PAR suffers from an inherent attribute imbalance issue, causing ViT to naturally focus more on attributes that appear frequently in the training set and ignore some pedestrian attributes that appear less. The native features extracted by ViT are not able to tolerate the imbalance attribute distribution issue. To tackle this issue, we propose two novel components: the Selective Feature Activation Method (SFAM) and the Orthogonal Feature Activation Loss. SFAM smartly suppresses the more informative attribute-specific features, compelling the PAR model to capture discriminative features from regions that are easily overlooked. The proposed loss enforces an orthogonal constraint on the original feature extracted by ViT and the suppressed features from SFAM, promoting the complementarity of features in space. We conduct experiments on several benchmark PAR datasets, including PETA, PA100K, RAPv1, and RAPv2, demonstrating the effectiveness of our method. Specifically, our method outperforms existing state-of-the-art approaches by GRL, IAA-Caps, ALM, and SSC in terms of mA on the four datasets, respectively.

Published

2024-03-24

How to Cite

Wu, J., Huang, Y., Gao, M., Niu, Y., Yang, M., Gao, Z., & Zhao, J. (2024). Selective and Orthogonal Feature Activation for Pedestrian Attribute Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6039-6047. https://doi.org/10.1609/aaai.v38i6.28419

Issue

Section

AAAI Technical Track on Computer Vision V