RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection
DOI:
https://doi.org/10.1609/aaai.v40i7.37512Abstract
Weakly-Supervised Video Anomaly Detection aims to identify anomalous events using only video-level labels, balancing annotation efficiency with practical applicability. However, existing methods often oversimplify the anomaly space by treating all abnormal events as a single category, overlooking the diverse semantic and temporal characteristics intrinsic to real-world anomalies. Inspired by how humans perceive anomalies, by jointly interpreting temporal motion patterns and semantic structures underlying different anomaly types, we propose RefineVAD, a novel framework that mimics this dual-process reasoning. Our framework integrates two core modules. The first, Motion-aware Temporal Attention and Recalibration (MoTAR), estimates motion salience and dynamically adjusts temporal focus via shift-based attention and global Transformer-based modeling. The second, Category-Oriented Refinement (CORE), injects soft anomaly category priors into the representation space by aligning segment-level features with learnable category prototypes through cross-attention. By jointly leveraging temporal dynamics and semantic structure, explicitly models both ``how'' motion evolves and ``what'' semantic category it resembles. Extensive experiments on WVAD benchmark validate the effectiveness of RefineVAD and highlight the importance of integrating semantic context to guide feature refinement toward anomaly-relevant patterns.Published
2026-03-14
How to Cite
Lee, J., Bang, C., Kim, M., & Cho, M. (2026). RefineVAD: Semantic-Guided Feature Recalibration for Weakly Supervised Video Anomaly Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5899–5907. https://doi.org/10.1609/aaai.v40i7.37512
Issue
Section
AAAI Technical Track on Computer Vision IV