Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models

Authors

  • Yulin He National University of Defense Technology
  • Wei Chen National University of Defense Technology
  • Xinbiao Gan National University of Defense Technology
  • Siqi Wang National University of Defense Technology
  • Haotian Wang National University of Defense Technology
  • Yusong Tan National University of Defense Technology

DOI:

https://doi.org/10.1609/aaai.v40i6.42470

Abstract

Perceiving threats is an innate human instinct. During driving, humans naturally focus their attention on objects that pose real potential risks. Motivated by this observation, we shift the focus from traditional class-based detection to a novel task termed threat-oriented reasoning detection in autonomous driving. This task aims to localize threat objects and reason about their threat levels from a driver-centric perspective. To support this task, we build a benchmark comprising diverse corner-case scenarios, annotated by multiple experienced drivers to reflect human-aligned threat cognition. Given the reasoning demands of this task, we then explore the capabilities of multi-modal large language models (MLLMs) and introduce two methods based on whether the MLLM supports object detection: 1) For MLLMs lacking detection capability, we introduce ThreatCoT, a plug-and-play training-free method that combines chain-of-thought (CoT) with a visual expert toolchain to support step-by-step reasoning. 2) For MLLMs with detection support, we introduce ThreatReasoner, an end-to-end reinforcement learning (RL)-based method built on the GRPO algorithm, which enables per-object reasoning through a fully unsupervised reward strategy. Both quantitative and qualitative experiments show that our methods can effectively unlock the new capabilities of MLLM in threat-oriented reasoning detection.

Downloads

Published

2026-03-14

How to Cite

He, Y., Chen, W., Gan, X., Wang, S., Wang, H., & Tan, Y. (2026). Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4690–4698. https://doi.org/10.1609/aaai.v40i6.42470

Issue

Section

AAAI Technical Track on Computer Vision III