Semantic-Driven Visual Progressive Refinement for Aerial-Ground Person ReID: A Challenging Large-Scale Benchmark

Aihua Zheng; Hao Xie; Xixi Wan; Zi Wang; Shihao Li; Jin Tang; Bin Luo

doi:10.1609/aaai.v40i16.38339

Authors

Aihua Zheng Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University School of Artificial Intelligence, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation
Hao Xie Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology, Anhui University
Xixi Wan School of Artificial Intelligence, Anhui University
Zi Wang Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University School of Biomedical Engineering, Anhui Medical University
Shihao Li School of Artificial Intelligence, Anhui University
Jin Tang Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology, Anhui University
Bin Luo Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology, Anhui University

DOI:

https://doi.org/10.1609/aaai.v40i16.38339

Abstract

Aerial-Ground Person Re-IDentification (AGPReID) aims to extract identity-discriminative representations from heterogeneous perspectives across different platforms in complex real-world environments. However, existing methods primarily focus on visual appearance modeling and make insufficient use of semantic attribute priors, which limits their ability to bridge the aerial-ground view gap. To address this limitation, we propose a Semantic-driven Visual Progressive Refinement framework for AGPReID (SVPR-ReID), which effectively leverages textual attribute priors to guide the extraction of fine-grained visual cues. Specifically, we design a View-Decoupled Feature Extractor that incorporates view-aware textual prompts to decouple view-invariant identity features. Then, to alleviate inter-class ambiguity, we propose an Attribute-Scattered Mixture-of-Experts module that integrates attribute semantics into the visual space, thereby improving discrimination among visually similar pedestrians. Finally, we design a Context-Vision Progressive Refinement module for progressive refinement of attribute and view-invariant features, obtaining robust cross-view identity representations. In particular, we contribute a comprehensive benchmark for AGPReID, named CP2108, which contains 142,817 images of 2,108 identities annotated with 22 attributes. Notably, it includes 191 identities captured across different times, enabling both short- and long-term ReID evaluation, addressing the limitation of existing datasets that focus only on short-term scenarios. Extensive experimental results validate the effectiveness of our SVPR-ReID on four AGPReID datasets.

Semantic-Driven Visual Progressive Refinement for Aerial-Ground Person ReID: A Challenging Large-Scale Benchmark

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information