Hierarchical Prompt Learning for Image- and Text-Based Person Re-Identification

Linhan Zhou; Shuang Li; Neng Dong; Yonghang Tai; Yafei Zhang; Huafeng Li

doi:10.1609/aaai.v40i16.38380

Authors

Linhan Zhou Kunmimg University of Science and Technology
Shuang Li Chongqing University of Post and Telecommunications
Neng Dong Nanjing University of Science and Technology
Yonghang Tai Yunnan Normal University
Yafei Zhang Kunmimg University of Science and Technology
Huafeng Li Kunmimg University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i16.38380

Abstract

Person re-identification (ReID) aims to retrieve target pedestrian images given either visual queries (image-to-image, I2I) or textual descriptions (text-to-image, T2I). Although both tasks share a common retrieval objective, they pose distinct challenges: I2I emphasizes discriminative identity learning, while T2I requires accurate cross-modal semantic alignment. Existing methods often treat these tasks separately, which may lead to representation entanglement and suboptimal performance. To address this, we propose a unified framework named Hierarchical Prompt Learning (HPL), which leverages task-aware prompt modeling to jointly optimize both tasks. Specifically, we first introduce a Task-Routed Transformer, which incorporates dual classification tokens into a shared visual encoder to route features for I2I and T2I branches respectively. On top of this, we develop a hierarchical prompt generation scheme that integrates identity-level learnable tokens with instance-level pseudo-text tokens. These pseudo-tokens are derived from image or text features via modality-specific inversion networks, injecting fine-grained, instance-specific semantics into the prompts. Furthermore, we propose a Cross-Modal Prompt Regularization strategy to enforce semantic alignment in the prompt token space, ensuring that pseudo-prompts preserve source-modality characteristics while enhancing cross-modal transferability. Extensive experiments on multiple ReID benchmarks validate the effectiveness of our method, achieving state-of-the-art performance on both I2I and T2I tasks.

Hierarchical Prompt Learning for Image- and Text-Based Person Re-Identification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information