WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

Pengxuan Yang; Ben Lu; Zhongpu Xia; Chao Han; Yinfeng Gao; Teng Zhang; Kun Zhan; Xianpeng Lang; Yupeng Zheng; Qichao Zhang

doi:10.1609/aaai.v40i14.38149

Authors

Pengxuan Yang The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS School of Advanced Interdisciplinary Sciences, UCAS Li Auto
Ben Lu Li Auto
Zhongpu Xia The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
Chao Han Li Auto
Yinfeng Gao The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
Teng Zhang Li Auto
Kun Zhan Li Auto
Xianpeng Lang Li Auto
Yupeng Zheng The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
Qichao Zhang The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS School of Artificial Intelligence, UCAS

DOI:

https://doi.org/10.1609/aaai.v40i14.38149

Abstract

Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% → 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).

WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information