Agent Journey Beyond RGB: Hierarchical Semantic-Spatial Representation Enrichment for Vision-and-Language Navigation

Xuesong Zhang; Yunbo Xu; Jia Li; Ruonan Liu; Zhenzhen Hu

doi:10.1609/aaai.v40i22.38948

Authors

Xuesong Zhang Hefei University of Technology
Yunbo Xu Hefei University of Technology
Jia Li Hefei University of Technology
Ruonan Liu Shanghai Jiao Tong University
Zhenzhen Hu Hefei University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i22.38948

Abstract

Navigating unseen environments based on natural language instructions remains difficult for egocentric agents in Vision-and-Language Navigation (VLN). Intuitively, humans inherently ground concrete semantic knowledge within spatial layouts during indoor navigation. Although previous studies have introduced diverse environmental representations to enhance reasoning, other co-occurrence modalities are often naively concatenated with RGB features, resulting in suboptimal utilization of each modality's distinct contribution. Inspired by this, we propose a hierarchical Semantic Understanding and Spatial Awareness (SUSA) architecture to enable agents to perceive and ground environments at diverse scales. Specifically, the Textual Semantic Understanding (TSU) module supports local action prediction by generating view-level descriptions, thereby capturing fine-grained environmental semantics and narrowing the modality gap between instructions and environments. Complementarily, the Depth-enhanced Spatial Perception (DSP) module incrementally constructs a trajectory-level depth exploration map, providing the agent with a coarse-grained comprehension of the global spatial layout. Extensive experiments demonstrate that SUSA's hierarchical representation enrichment not only boosts the navigation performance of the baseline on discrete VLN benchmarks (REVERIE, R2R, and SOON), but also exhibits superior generalization to the continuous R2R-CE.

Agent Journey Beyond RGB: Hierarchical Semantic-Spatial Representation Enrichment for Vision-and-Language Navigation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information