AR-Nav Benchmark: Augmented Reality Navigation with Vision and Language

Liqi Yan; Yihao Wu; Chenyi Xu; Chao Yang; Jianhui Zhang; Pan Li

doi:10.1609/aaai.v40i21.38849

Authors

Liqi Yan Hangzhou Dianzi University
Yihao Wu Hangzhou Dianzi University
Chenyi Xu Hangzhou Dianzi University
Chao Yang Hangzhou Dianzi University
Jianhui Zhang Hangzhou Dianzi University
Pan Li Hangzhou Dianzi University

DOI:

https://doi.org/10.1609/aaai.v40i21.38849

Abstract

Augmented Reality (AR) navigation has emerged as a transformative tool for spatial intelligence, enabling users to interactively explore complex environments through wearable and mobile AR devices. However, current AR navigation systems struggle with low indoor localization accuracy, weak semantic understanding, and limited long-term memory, which severely limits their adaptability in dynamic, multi-floor, and large-scale real-world settings. To address these challenges, we present AR-Nav benchmark, a novel dataset with corresponding suite that leverages vision and language for AR navigation. First, to construct this benchmark, we proposed an Augmented Reality Visual-Language Memory Model (AR‑VLM²), which generates structured, semantically rich, and temporally indexed representations for long-term AR navigation. Second, we design a lightweight navigation intent recommending module with hierarchical topological reasoning and language-grounded path planning, called ARN‑Pilot, enabling low-latency and personalized route selection. Third, we introduce a closed-loop AR interaction module that supports real-time multi-modal feedback, dynamic memory updates, and human-in-the-loop query refinement. Extensive experiments in indoor multi-floor and outdoor parking scenarios show that AR-Nav suite significantly outperforms state-of-the-art AR navigation methods.

AR-Nav Benchmark: Augmented Reality Navigation with Vision and Language

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information