CoT-VLNBench: A Benchmark for Visual Chain-of-Thought Reasoning in Vision-Language-Navigation Robots
DOI:
https://doi.org/10.1609/aaai.v40i43.40980Abstract
Recent advances in vision language models (VLMs) have demonstrated remarkable potential in embodied navigation tasks. However, existing robot-centric datasets primarily focus on traditional 3D tasks such as perception and prediction, lacking adequate support for vision-language tasks. Vision-language-navigation (VLN) is a key capability for achieving human-like and interpretable navigation in complex environments. In this study, we present CoT-VLNBench, the first large-scale benchmark and dataset designed for chain-of-thought (CoT) reasoning in quadruped robot navigation. Our dataset encompasses a diverse range of indoor and outdoor scenes, multi-step navigation trajectories, and rich natural language instructions, all annotated with fine-grained CoT reasoning traces. Specifically, it contains 175K frames, 5.25M 3D bounding boxes, and 875K vision–question–answer (VQA) pairs. This comprehensive resource enables thorough evaluation of embodied agents’ perceptual and step-by-step reasoning abilities. Furthermore, we propose a novel CoT-VLN model, a state-of-the-art 7B VLN model that integrates visual, linguistic, and reasoning modules, to facilitate interpretable and effective navigation. Extensive experiments demonstrate that our approach significantly outperforms existing non-VLMs baselines on the new benchmark, underscoring the importance of CoT-VLN in embodied navigation. We hope that CoT-VLNBench will serve as a valuable resource to advance research at the intersection of robotics, vision, language, and reasoning.Downloads
Published
2026-03-14
How to Cite
Zhao, X., Liu, C., Ji, R., Zhang, Z., Zhu, M., Song, L., … Su, K. (2026). CoT-VLNBench: A Benchmark for Visual Chain-of-Thought Reasoning in Vision-Language-Navigation Robots. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36573–36581. https://doi.org/10.1609/aaai.v40i43.40980
Issue
Section
AAAI Technical Track on Planning, Routing, and Scheduling