Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Authors

  • Yatai Ji National University of Defense Technology
  • Zhengqiu Zhu National University of Defense Technology
  • Yong Zhao National University of Defense Technology
  • Beidan Liu National University of Defense Technology
  • Chen Gao Tsinghua University
  • Yihao Zhao Tsinghua University
  • Sihang Qiu National University of Defense Technology
  • Yue Hu National University of Defense Technology
  • Quanjun Yin National University of Defense Technology

DOI:

https://doi.org/10.1609/aaai.v40i22.38898

Abstract

Aerial Visual Object Search (AVOS) tasks in urban environments require Unmanned Aerial Vehicles (UAVs) to autonomously search for and identify target objects based on visual inputs without external guidance. Existing approaches struggle in complex urban environments due to redundant semantic processing, similar object ambiguity, and the exploration-exploitation dilemma. To advance research and support the AVOS task, we introduce CityAVOS, the first benchmark dataset for autonomous search of static urban objects. It features 2,420 tasks of varying difficulty across six object categories, designed to rigorously evaluate UAV search strategies. To solve the AVOS task, we also propose PRPSearcher (Perception-Reasoning-Planning Searcher), a novel agentic method powered by multi-modal large language models (MLLMs) that enables a UAV agent to think and reason like humans on visual cues when searching for objects. Specifically, PRPSearcher constructs three specialized maps: an object-centric dynamic semantic map enhancing spatial perception, a 3D cognitive map based on semantic "attraction" values for target reasoning, and a 3D uncertainty map for balanced exploration-exploitation search. Moreover, we propose a denoising mechanism to mitigate interference from similar objects and design an Inspiration Promote Thought prompting mechanism for adaptive action planning. Experimental results on CityAVOS demonstrate that PRPSearcher surpasses existing baselines in both success rate and search efficiency (on average: +37.69% SR, +28.96% SPL, -30.69% MSS, and -46.40% NE). Our work paves the way for future advances in embodied visual target search.

Downloads

Published

2026-03-14

How to Cite

Ji, Y., Zhu, Z., Zhao, Y., Liu, B., Gao, C., Zhao, Y., … Yin, Q. (2026). Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18342–18350. https://doi.org/10.1609/aaai.v40i22.38898

Issue

Section

AAAI Technical Track on Intelligent Robotics