Driving with Advice: Large Model as Motion Advisor for Joint Planning
DOI:
https://doi.org/10.1609/aaai.v40i2.37088Abstract
We address the challenge of integrating high-level semantic reasoning with low-level trajectory planning in end-to-end autonomous driving, where most existing frameworks decouple perception, decision-making, and control, leading to limited interpretability and poor instruction compliance. To bridge this gap, we propose Driving with Advice, a novel closed-loop framework that treats a vision-language model (VLM) as a motion advisor to provide interpretable, language-mediated guidance for trajectory generation. Our approach introduces three key innovations: (1) Semantic-Intentional Pretraining (SIP), which injects driving rationale into a compact VLM via machine-generated question-answering pairs; (2) a discrete action space grounded in directional and speed primitives, enabling structured and interpretable policy learning; and (3) an advice-following diffusion policy refined via Group Relative Policy Optimization under a multi-objective reward that ensures safety, comfort, and alignment with semantic intent. We evaluate our method on the NAVSIM benchmark in a closed-loop setting, achieving a state-of-the-art Predictive Driver Model Score (PDMS) of 91.5, outperforming strong baselines in safety (NC: 99.2). The results demonstrate that leveraging language as a cognitive interface between perception and control enhances both generalization and behavioral transparency, advancing the paradigm of language-conditioned driving.Published
2026-03-14
How to Cite
Wang, J., Yu, J., Lin, H., Liu, H., Zhu, W., & Xiong, S. (2026). Driving with Advice: Large Model as Motion Advisor for Joint Planning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(2), 1168–1176. https://doi.org/10.1609/aaai.v40i2.37088
Issue
Section
AAAI Technical Track on Application Domains II