AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization

Authors

  • Jingyi Liao Nanyang Technological University Institute for Infocomm Research (I2R), A*STAR
  • Yongyi Su Institute for Infocomm Research (I2R), A*STAR South China University of Technology
  • Rong-Cheng Tu Nanyang Technological University
  • Zhao Jin Nanyang Technological University
  • Wenhao Sun Nanyang Technological University
  • Yiting Li Institute for Infocomm Research (I2R), A*STAR
  • Xun Xu Institute for Infocomm Research (I2R), A*STAR
  • Dacheng Tao Nanyang Technological University
  • Xulei Yang Institute for Infocomm Research (I2R), A*STAR

DOI:

https://doi.org/10.1609/aaai.v40i18.38548

Abstract

While Multimodal Large Language Models (MLLMs) demonstrate remarkable capabilities across diverse domains, their application to specialized anomaly detection (AD) remains constrained by domain adaptation challenges. Existing Group Relative Policy Optimization (GRPO) based approaches suffer from two critical limitations: inadequate training data utilization when models produce uniform responses, and insufficient supervision over reasoning processes that encourage immediate binary decisions without deliberative analysis. We propose a comprehensive framework addressing these limitations through two synergistic innovations. First, we introduce a multi-stage deliberative reasoning process that guides models from region identification to focused examination, generating diverse response patterns essential for GRPO optimization while enabling structured supervision over analytical workflows. Second, we develop a fine-grained reward mechanism incorporating classification accuracy and localization supervision, transforming binary feedback into continuous signals that distinguish genuine analytical insight from spurious correctness. Comprehensive evaluation across multiple industrial datasets shows that our method achieves superior accuracy by enabling general-purpose MLLMs to acquire fine-grained visual discrimination for detecting subtle manufacturing defects.

Downloads

Published

2026-03-14

How to Cite

Liao, J., Su, Y., Tu, R.-C., Jin, Z., Sun, W., Li, Y., … Yang, X. (2026). AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(18), 15234–15242. https://doi.org/10.1609/aaai.v40i18.38548

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management II