RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection

Authors

  • Junhee Lee Kyung Hee University
  • ChaeBeen Bang Kyung Hee University
  • MyoungChul Kim Kyung Hee University
  • MyeongAh Cho Kyung Hee University

DOI:

https://doi.org/10.1609/aaai.v40i7.37512

Abstract

Weakly-Supervised Video Anomaly Detection aims to identify anomalous events using only video-level labels, balancing annotation efficiency with practical applicability. However, existing methods often oversimplify the anomaly space by treating all abnormal events as a single category, overlooking the diverse semantic and temporal characteristics intrinsic to real-world anomalies. Inspired by how humans perceive anomalies, by jointly interpreting temporal motion patterns and semantic structures underlying different anomaly types, we propose RefineVAD, a novel framework that mimics this dual-process reasoning. Our framework integrates two core modules. The first, Motion-aware Temporal Attention and Recalibration (MoTAR), estimates motion salience and dynamically adjusts temporal focus via shift-based attention and global Transformer-based modeling. The second, Category-Oriented Refinement (CORE), injects soft anomaly category priors into the representation space by aligning segment-level features with learnable category prototypes through cross-attention. By jointly leveraging temporal dynamics and semantic structure, explicitly models both ``how'' motion evolves and ``what'' semantic category it resembles. Extensive experiments on WVAD benchmark validate the effectiveness of RefineVAD and highlight the importance of integrating semantic context to guide feature refinement toward anomaly-relevant patterns.

Downloads

Published

2026-03-14

How to Cite

Lee, J., Bang, C., Kim, M., & Cho, M. (2026). RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5899–5907. https://doi.org/10.1609/aaai.v40i7.37512

Issue

Section

AAAI Technical Track on Computer Vision IV