Semantic-Driven Visual Progressive Refinement for Aerial-Ground Person ReID: A Challenging Large-Scale Benchmark
DOI:
https://doi.org/10.1609/aaai.v40i16.38339Abstract
Aerial-Ground Person Re-IDentification (AGPReID) aims to extract identity-discriminative representations from heterogeneous perspectives across different platforms in complex real-world environments. However, existing methods primarily focus on visual appearance modeling and make insufficient use of semantic attribute priors, which limits their ability to bridge the aerial-ground view gap. To address this limitation, we propose a Semantic-driven Visual Progressive Refinement framework for AGPReID (SVPR-ReID), which effectively leverages textual attribute priors to guide the extraction of fine-grained visual cues. Specifically, we design a View-Decoupled Feature Extractor that incorporates view-aware textual prompts to decouple view-invariant identity features. Then, to alleviate inter-class ambiguity, we propose an Attribute-Scattered Mixture-of-Experts module that integrates attribute semantics into the visual space, thereby improving discrimination among visually similar pedestrians. Finally, we design a Context-Vision Progressive Refinement module for progressive refinement of attribute and view-invariant features, obtaining robust cross-view identity representations. In particular, we contribute a comprehensive benchmark for AGPReID, named CP2108, which contains 142,817 images of 2,108 identities annotated with 22 attributes. Notably, it includes 191 identities captured across different times, enabling both short- and long-term ReID evaluation, addressing the limitation of existing datasets that focus only on short-term scenarios. Extensive experimental results validate the effectiveness of our SVPR-ReID on four AGPReID datasets.Downloads
Published
2026-03-14
How to Cite
Zheng, A., Xie, H., Wan, X., Wang, Z., Li, S., Tang, J., & Luo, B. (2026). Semantic-Driven Visual Progressive Refinement for Aerial-Ground Person ReID: A Challenging Large-Scale Benchmark. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13360–13368. https://doi.org/10.1609/aaai.v40i16.38339
Issue
Section
AAAI Technical Track on Computer Vision XIII