Driving with Advice: Large Model as Motion Advisor for Joint Planning

Junyin Wang; Jinlei Yu; Hao Lin; Huikai Liu; Wenqian Zhu; Shengwu Xiong

doi:10.1609/aaai.v40i2.37088

Authors

Junyin Wang School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070
Jinlei Yu School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074
Hao Lin VOYAH Automobile Technology Co., Ltd., Wuhan 430051, China
Huikai Liu School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070 VOYAH Automobile Technology Co., Ltd., Wuhan 430051, China
Wenqian Zhu VOYAH Automobile Technology Co., Ltd., Wuhan 430051, China
Shengwu Xiong School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070 Interdisciplinary Artificial Intelligence Research Institute, Wuhan College, Wuhan 430212, China

DOI:

https://doi.org/10.1609/aaai.v40i2.37088

Abstract

We address the challenge of integrating high-level semantic reasoning with low-level trajectory planning in end-to-end autonomous driving, where most existing frameworks decouple perception, decision-making, and control, leading to limited interpretability and poor instruction compliance. To bridge this gap, we propose Driving with Advice, a novel closed-loop framework that treats a vision-language model (VLM) as a motion advisor to provide interpretable, language-mediated guidance for trajectory generation. Our approach introduces three key innovations: (1) Semantic-Intentional Pretraining (SIP), which injects driving rationale into a compact VLM via machine-generated question-answering pairs; (2) a discrete action space grounded in directional and speed primitives, enabling structured and interpretable policy learning; and (3) an advice-following diffusion policy refined via Group Relative Policy Optimization under a multi-objective reward that ensures safety, comfort, and alignment with semantic intent. We evaluate our method on the NAVSIM benchmark in a closed-loop setting, achieving a state-of-the-art Predictive Driver Model Score (PDMS) of 91.5, outperforming strong baselines in safety (NC: 99.2). The results demonstrate that leveraging language as a cognitive interface between perception and control enhances both generalization and behavioral transparency, advancing the paradigm of language-conditioned driving.

Driving with Advice: Large Model as Motion Advisor for Joint Planning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information