Joint Implicit and Explicit Language Learning for Pedestrian Attribute Recognition

Yukang Zhang; Lei Tan; Yang Lu; Yan Yan; Hanzi Wang

doi:10.1609/aaai.v40i15.38296

Authors

Yukang Zhang Xiamen University
Lei Tan National University of Singapore
Yang Lu Xiamen University
Yan Yan Xiamen University
Hanzi Wang Xiamen University

DOI:

https://doi.org/10.1609/aaai.v40i15.38296

Abstract

Pedestrian attribute recognition (PAR) has received increasing attention due to its wide application in video surveillance and pedestrian analysis. Some text-enhanced methods tackle this task by converting attributes into language descriptions to facilitate interactive learning between attributes and visual images. However, these generic languages fail to uniquely describe different pedestrian images, missing individual characteristics. In this paper, we propose a Joint Implicit and Explicit Language Guidance Enhancement Learning (JGEL) method, which converts each pedestrian image into a language description with dual language learning to effectively learn enhanced attribute information. Specifically, we first propose an Implicit Language Guidance Learning (ILGL) stream. It projects visual image features into the text embedding space to generate pseudo-word tokens, implicitly modeling image attributes and providing personalized descriptions. Moreover, we propose an Explicit Attribute Enhancement Learning (EAEL) stream to guide the generated pseudo-word tokens obtained by ILGL explicitly aligned with pedestrian attributes, which can effectively align the pseudo-word tokens with the attribute concepts in the text embedding space. Extensive experiments show that JGEL has significant advantages in improving the performance of PAR and the challenging zero-shot PAR task.

Joint Implicit and Explicit Language Learning for Pedestrian Attribute Recognition

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information