VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

Qimao Chen; Fang Li; Shaoqing Xu; Zhiyi Lai; Zixun Xie; Yuechen Luo; Shengyin Jiang; Hanbing Li; Long Chen; Bing Wang; Yi Zhang; Zhi-Xin Yang

doi:10.1609/aaai.v40i4.37290

Authors

Qimao Chen Tsinghua University
Fang Li University of Macau Xiaomi EV
Shaoqing Xu University of Macau Xiaomi EV
Zhiyi Lai Xiaomi EV
Zixun Xie Peking University
Yuechen Luo Tsinghua University
Shengyin Jiang Xiaomi EV
Hanbing Li Xiaomi EV
Long Chen Xiaomi EV
Bing Wang Xiaomi EV
Yi Zhang Tsinghua University
Zhi-Xin Yang University of Macau

DOI:

https://doi.org/10.1609/aaai.v40i4.37290

Abstract

The safe deployment of autonomous driving (AD) systems is fundamentally hindered by the long-tail problem, where rare yet critical driving scenarios are severely underrepresented in real-world data. Existing solutions including safety-critical scenario generation and closed-loop learning often rely on rule-based heuristics, resampling methods and generative models learned from offline datasets, limiting their ability to produce diverse and novel challenges. While recent works leverage Vision Language Models (VLMs) to produce scene descriptions that guide a separate, downstream model in generating hazardous trajectories for agents, such two-stage framework constrains the generative potential of VLMs, as the diversity of the final trajectories is ultimately limited by the generalization ceiling of the downstream algorithm. To overcome these limitations, we introduce VILTA (VLM-In-the-Loop Trajectory Adversary), a novel framework that integrates a VLM into the closed-loop training of AD agents. Unlike prior works, VILTA actively participates in the training loop by comprehending the dynamic driving environment and strategically generating challenging scenarios through direct, fine-grained editing of surrounding agents' future trajectories. This direct-editing approach fully leverages the VLM's powerful generalization capabilities to create a diverse curriculum of plausible yet challenging scenarios that extend beyond the scope of traditional methods. We demonstrate that our approach substantially enhances the safety and robustness of the resulting AD policy, particularly in its ability to navigate critical long-tail events.

VILTA: A VLM-in-the-Loop Adversary for Enhancing Driving Policy Robustness

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information