Driving with Advice: Large Model as Motion Advisor for Joint Planning

Authors

  • Junyin Wang School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070
  • Jinlei Yu School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074
  • Hao Lin VOYAH Automobile Technology Co., Ltd., Wuhan 430051, China
  • Huikai Liu School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070 VOYAH Automobile Technology Co., Ltd., Wuhan 430051, China
  • Wenqian Zhu VOYAH Automobile Technology Co., Ltd., Wuhan 430051, China
  • Shengwu Xiong School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, 430070 Interdisciplinary Artificial Intelligence Research Institute, Wuhan College, Wuhan 430212, China

DOI:

https://doi.org/10.1609/aaai.v40i2.37088

Abstract

We address the challenge of integrating high-level semantic reasoning with low-level trajectory planning in end-to-end autonomous driving, where most existing frameworks decouple perception, decision-making, and control, leading to limited interpretability and poor instruction compliance. To bridge this gap, we propose Driving with Advice, a novel closed-loop framework that treats a vision-language model (VLM) as a motion advisor to provide interpretable, language-mediated guidance for trajectory generation. Our approach introduces three key innovations: (1) Semantic-Intentional Pretraining (SIP), which injects driving rationale into a compact VLM via machine-generated question-answering pairs; (2) a discrete action space grounded in directional and speed primitives, enabling structured and interpretable policy learning; and (3) an advice-following diffusion policy refined via Group Relative Policy Optimization under a multi-objective reward that ensures safety, comfort, and alignment with semantic intent. We evaluate our method on the NAVSIM benchmark in a closed-loop setting, achieving a state-of-the-art Predictive Driver Model Score (PDMS) of 91.5, outperforming strong baselines in safety (NC: 99.2). The results demonstrate that leveraging language as a cognitive interface between perception and control enhances both generalization and behavioral transparency, advancing the paradigm of language-conditioned driving.

Downloads

Published

2026-03-14

How to Cite

Wang, J., Yu, J., Lin, H., Liu, H., Zhu, W., & Xiong, S. (2026). Driving with Advice: Large Model as Motion Advisor for Joint Planning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(2), 1168–1176. https://doi.org/10.1609/aaai.v40i2.37088

Issue

Section

AAAI Technical Track on Application Domains II