Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models

Yulin He; Wei Chen; Xinbiao Gan; Siqi Wang; Haotian Wang; Yusong Tan

doi:10.1609/aaai.v40i6.42470

Authors

Yulin He National University of Defense Technology
Wei Chen National University of Defense Technology
Xinbiao Gan National University of Defense Technology
Siqi Wang National University of Defense Technology
Haotian Wang National University of Defense Technology
Yusong Tan National University of Defense Technology

DOI:

https://doi.org/10.1609/aaai.v40i6.42470

Abstract

Perceiving threats is an innate human instinct. During driving, humans naturally focus their attention on objects that pose real potential risks. Motivated by this observation, we shift the focus from traditional class-based detection to a novel task termed threat-oriented reasoning detection in autonomous driving. This task aims to localize threat objects and reason about their threat levels from a driver-centric perspective. To support this task, we build a benchmark comprising diverse corner-case scenarios, annotated by multiple experienced drivers to reflect human-aligned threat cognition. Given the reasoning demands of this task, we then explore the capabilities of multi-modal large language models (MLLMs) and introduce two methods based on whether the MLLM supports object detection: 1) For MLLMs lacking detection capability, we introduce ThreatCoT, a plug-and-play training-free method that combines chain-of-thought (CoT) with a visual expert toolchain to support step-by-step reasoning. 2) For MLLMs with detection support, we introduce ThreatReasoner, an end-to-end reinforcement learning (RL)-based method built on the GRPO algorithm, which enables per-object reasoning through a fully unsupervised reward strategy. Both quantitative and qualitative experiments show that our methods can effectively unlock the new capabilities of MLLM in threat-oriented reasoning detection.

Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information