Lightweight Adaptive Topological Layout and Semantic Mapping in Vision-and-Language Navigation on Websites
DOI:
https://doi.org/10.1609/aaai.v40i22.38901Abstract
Vision-and-Language navigation on websites requires agents to navigate target webpages and answer questions based on human instructions. Current web agents primarily leverage Large Language Models (LLMs) for semantic understanding and reasoning, but still suffer from limited navigation performance and slow inference speed. Constructing a global map across webpages can effectively enhance both navigation accuracy and efficiency, however, this is challenged by the open structure of web navigation graphs and the dynamic nature of web layouts. In this paper, we propose ATLAS: Adaptive Topological Layout And Semantic mapping, a framework that adaptively constructs a time-varying, unbounded topological map across webpages and unifies heterogeneous elements through semantic representation. This enables both global path planning and local element selection for web-based navigation and question answering. As a lightweight approach, ATLAS significantly outperforms existing state-of-the-art methods on the WebVLN benchmark with a 10% improvement in success rate, and achieves the highest average task success rate on both the Mind2Web and WebArena benchmarks.Published
2026-03-14
How to Cite
Lai, P., Xie, Z., & Yang, H. (2026). Lightweight Adaptive Topological Layout and Semantic Mapping in Vision-and-Language Navigation on Websites. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18370–18378. https://doi.org/10.1609/aaai.v40i22.38901
Issue
Section
AAAI Technical Track on Intelligent Robotics