TokenMatcher: Diverse Tokens Matching for Unsupervised Visible-Infrared Person Re-Identification

Authors

  • Xiao Wang School of Computer Science and Technology, Wuhan University of Science and Technology School of Computer Science, Wuhan University
  • Lekai Liu School of Computer Science and Technology, Wuhan University of Science and Technology Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, China
  • Bin Yang School of Computer Science, Wuhan University
  • Mang Ye School of Computer Science, Wuhan University
  • Zheng Wang School of Computer Science, Wuhan University
  • Xin Xu School of Computer Science and Technology, Wuhan University of Science and Technology Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, China

DOI:

https://doi.org/10.1609/aaai.v39i8.32855

Abstract

Unsupervised visible-infrared person re-identification (US-VI-ReID) seeks to match infrared and visible images of the same individual without the use of annotations. Current methods typically derive cross-modal correspondences through a single global feature matching process for generating pseudo labels and learning modality-invariant features. However, this matching approach is hindered by both intra-modality and inter-modality discrepancies, which result in imprecise measurements. As a consequence, the clustering of individuals with single global feature is often incomplete and unreliable, leading to suboptimal performance in cross-modal clustering tasks. To address these challenges and to extract cross-modality discriminative identity information, we propose a TokenMatcher, which encompasses three key components: Diverse Tokens Matching (DTM), Diverse Tokens Neighbor Learning (DTNL), and the Homogeneous Fusion (HF) Module. DTM utilizes multiple class tokens within the visual transformer framework to capture diverse embedding representations, thereby facilitating the integration of fine-grained information essential for reliable cross-modality correspondences. DTNL enhances the intra-modality and inter-modality consistency among diverse tokens by refining neighborhood sets with insights from neighboring tokens and camera information, promoting robust neighborhood learning and fostering discriminative identity information. Additionally, the HF module consolidates clusters of the same identity while effectively separating those of different identities. Extensive experiments conducted on the publicly available SYSU-MM01 and RegDB datasets demonstrate the efficacy of the proposed method.

Downloads

Published

2025-04-11

How to Cite

Wang, X., Liu, L., Yang, B., Ye, M., Wang, Z., & Xu, X. (2025). TokenMatcher: Diverse Tokens Matching for Unsupervised Visible-Infrared Person Re-Identification. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 7934-7942. https://doi.org/10.1609/aaai.v39i8.32855

Issue

Section

AAAI Technical Track on Computer Vision VII