MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming

Shuo Wang; Yongcai Wang; Zhaoxin Fan; Yucheng Wang; Maiyue Chen; Kaihui Wang; Zhizhong Su; Wanting Li; Xudong Cai; Yeying Jin; Deying Li

doi:10.1609/aaai.v40i12.37974

Authors

Shuo Wang Renmin University of China
Yongcai Wang Renmin University of China
Zhaoxin Fan Innovation Center for Future Blockchain and Privacy Computing
Yucheng Wang Horizon Robotics
Maiyue Chen Horizon Robotics
Kaihui Wang Horizon Robotics
Zhizhong Su Horizon Robotics
Wanting Li Renmin University of China
Xudong Cai Renmin University of China
Yeying Jin National University of Singapore
Deying Li Renmin University of China

DOI:

https://doi.org/10.1609/aaai.v40i12.37974

Abstract

Vision-Language Navigation (VLN) tasks often leverage panoramic RGB and depth inputs to provide rich spatial cues for action planning, but these sensors can be costly or less accessible in real-world deployments. Recent approaches based on Vision-Language Action (VLA) models achieve strong results with monocular input, yet they still lag behind methods using panoramic RGB-D information. We present MonoDream, a lightweight VLA framework that enables monocular agents to learn a Unified Navigation Representation (UNR). This shared feature representation jointly aligns navigation-relevant visual semantics (e.g., global layout, depth, and future cues) and language-grounded action intent, enabling more reliable action prediction. MonoDream further introduces Latent Panoramic Dreaming (LPD) tasks to supervise the UNR, which train the model to predict latent features of panoramic RGB and depth observations at both current and future steps based on only monocular input. Experiments on multiple VLN benchmarks show that MonoDream consistently improves monocular navigation performance and significantly narrows the gap with panoramic-based agents.

MonoDream: Monocular Vision-Language Navigation with Panoramic Dreaming

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information