AR-Nav Benchmark: Augmented Reality Navigation with Vision and Language
DOI:
https://doi.org/10.1609/aaai.v40i21.38849Abstract
Augmented Reality (AR) navigation has emerged as a transformative tool for spatial intelligence, enabling users to interactively explore complex environments through wearable and mobile AR devices. However, current AR navigation systems struggle with low indoor localization accuracy, weak semantic understanding, and limited long-term memory, which severely limits their adaptability in dynamic, multi-floor, and large-scale real-world settings. To address these challenges, we present AR-Nav benchmark, a novel dataset with corresponding suite that leverages vision and language for AR navigation. First, to construct this benchmark, we proposed an Augmented Reality Visual-Language Memory Model (AR‑VLM²), which generates structured, semantically rich, and temporally indexed representations for long-term AR navigation. Second, we design a lightweight navigation intent recommending module with hierarchical topological reasoning and language-grounded path planning, called ARN‑Pilot, enabling low-latency and personalized route selection. Third, we introduce a closed-loop AR interaction module that supports real-time multi-modal feedback, dynamic memory updates, and human-in-the-loop query refinement. Extensive experiments in indoor multi-floor and outdoor parking scenarios show that AR-Nav suite significantly outperforms state-of-the-art AR navigation methods.Downloads
Published
2026-03-14
How to Cite
Yan, L., Wu, Y., Xu, C., Yang, C., Zhang, J., & Li, P. (2026). AR-Nav Benchmark: Augmented Reality Navigation with Vision and Language. Proceedings of the AAAI Conference on Artificial Intelligence, 40(21), 17904–17912. https://doi.org/10.1609/aaai.v40i21.38849
Issue
Section
AAAI Technical Track on Humans and AI