WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

Authors

  • Pengxuan Yang The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS School of Advanced Interdisciplinary Sciences, UCAS Li Auto
  • Ben Lu Li Auto
  • Zhongpu Xia The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
  • Chao Han Li Auto
  • Yinfeng Gao The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
  • Teng Zhang Li Auto
  • Kun Zhan Li Auto
  • Xianpeng Lang Li Auto
  • Yupeng Zheng The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
  • Qichao Zhang The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS School of Artificial Intelligence, UCAS

DOI:

https://doi.org/10.1609/aaai.v40i14.38149

Abstract

Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% → 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).

Downloads

Published

2026-03-14

How to Cite

Yang, P., Lu, B., Xia, Z., Han, C., Gao, Y., Zhang, T., Zhan, K., Lang, X., Zheng, Y., & Zhang, Q. (2026). WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 11649-11657. https://doi.org/10.1609/aaai.v40i14.38149

Issue

Section

AAAI Technical Track on Computer Vision XI