Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

Zhiqi Pang; Lingling Zhao; Yang Liu; Chunyu Wang; Gaurav Sharma

doi:10.1609/aaai.v40i10.37775

Authors

Zhiqi Pang Faculty of Computing, Harbin Institute of Technology, Harbin, China
Lingling Zhao Faculty of Computing, Harbin Institute of Technology, Harbin, China
Yang Liu Faculty of Computing, Harbin Institute of Technology, Harbin, China
Chunyu Wang Faculty of Computing, Harbin Institute of Technology, Harbin, China
Gaurav Sharma Department of Electrical and Computer Engineering, University of Rochester, Rochester, USA

DOI:

https://doi.org/10.1609/aaai.v40i10.37775

Abstract

We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a single coherent framework. To tackle UMS-ReID, we introduce image-text knowledge modeling (ITKM) -- a three-stage framework that effectively exploits the representational power of vision-language models. We start with a pre-trained CLIP model with an image encoder and a text encoder. In Stage I, we introduce a scenario embedding in the image encoder and fine-tune the encoder to adaptively leverage knowledge from multiple scenarios. In Stage II, we optimize a set of learned text embeddings to associate with pseudo-labels from Stage I and introduce a multi-scenario separation loss to increase the divergence between inter-scenario text representations. In Stage III, we first introduce cluster-level and instance-level heterogeneous matching modules to obtain reliable heterogeneous positive pairs (e.g., a visible image and an infrared image of the same person) within each scenario. Next, we propose a dynamic text representation update strategy to maintain consistency between text and image supervision signals. Experimental results across multiple scenarios demonstrate the superiority and generalizability of ITKM; it not only outperforms existing scenario-specific methods but also enhances overall performance by integrating knowledge from multiple scenarios.

Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information