Semantic-Driven Visual Progressive Refinement for Aerial-Ground Person ReID: A Challenging Large-Scale Benchmark

Authors

  • Aihua Zheng Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University School of Artificial Intelligence, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation
  • Hao Xie Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology, Anhui University
  • Xixi Wan School of Artificial Intelligence, Anhui University
  • Zi Wang Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University School of Biomedical Engineering, Anhui Medical University
  • Shihao Li School of Artificial Intelligence, Anhui University
  • Jin Tang Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology, Anhui University
  • Bin Luo Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology, Anhui University

DOI:

https://doi.org/10.1609/aaai.v40i16.38339

Abstract

Aerial-Ground Person Re-IDentification (AGPReID) aims to extract identity-discriminative representations from heterogeneous perspectives across different platforms in complex real-world environments. However, existing methods primarily focus on visual appearance modeling and make insufficient use of semantic attribute priors, which limits their ability to bridge the aerial-ground view gap. To address this limitation, we propose a Semantic-driven Visual Progressive Refinement framework for AGPReID (SVPR-ReID), which effectively leverages textual attribute priors to guide the extraction of fine-grained visual cues. Specifically, we design a View-Decoupled Feature Extractor that incorporates view-aware textual prompts to decouple view-invariant identity features. Then, to alleviate inter-class ambiguity, we propose an Attribute-Scattered Mixture-of-Experts module that integrates attribute semantics into the visual space, thereby improving discrimination among visually similar pedestrians. Finally, we design a Context-Vision Progressive Refinement module for progressive refinement of attribute and view-invariant features, obtaining robust cross-view identity representations. In particular, we contribute a comprehensive benchmark for AGPReID, named CP2108, which contains 142,817 images of 2,108 identities annotated with 22 attributes. Notably, it includes 191 identities captured across different times, enabling both short- and long-term ReID evaluation, addressing the limitation of existing datasets that focus only on short-term scenarios. Extensive experimental results validate the effectiveness of our SVPR-ReID on four AGPReID datasets.

Downloads

Published

2026-03-14

How to Cite

Zheng, A., Xie, H., Wan, X., Wang, Z., Li, S., Tang, J., & Luo, B. (2026). Semantic-Driven Visual Progressive Refinement for Aerial-Ground Person ReID: A Challenging Large-Scale Benchmark. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13360–13368. https://doi.org/10.1609/aaai.v40i16.38339

Issue

Section

AAAI Technical Track on Computer Vision XIII