SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

Authors

  • Zebin Han North University of China State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences SouthEast University
  • Xudong Wang State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Baichen Liu State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
  • Qi Lyu State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Zhenduo Shang State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Jiahua Dong Mohamed bin Zayed University of Artificial Intelligence
  • Lianqing Liu State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
  • Zhi Han State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i22.38891

Abstract

Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task trajectory navigation guided by complex, long-horizon natural language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a novel navigation model built on a hierarchical planning framework. Our SeqWalker features: (1) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; (2) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the effectiveness and superiority of SeqWalker.

Downloads

Published

2026-03-14

How to Cite

Han, Z., Wang, X., Liu, B., Lyu, Q., Shang, Z., Dong, J., … Han, Z. (2026). SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18279–18287. https://doi.org/10.1609/aaai.v40i22.38891

Issue

Section

AAAI Technical Track on Intelligent Robotics