Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt

Benfeng Wang; Chao Huang; Jie Wen; Wei Wang; Yabo Liu; Yong Xu

doi:10.1609/aaai.v39i20.35398

Authors

Benfeng Wang Sun Yat-Sen University
Chao Huang Sun Yat-Sen University
Jie Wen Harbin Institute of Technology
Wei Wang Sun Yat-Sen University
Yabo Liu Harbin Institute of Technology
Yong Xu Harbin Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v39i20.35398

Abstract

Video anomaly detection (VAD) aims at locating the abnormal events in videos. Recently, the Weakly Supervised VAD has made great progress, which only requires video-level annotations when training. In practical applications, different institutions may have different types of abnormal videos. However, the abnormal videos cannot be circulated on the internet due to privacy protection. To train a more generalized anomaly detector that can identify various anomalies, it is reasonable to introduce federated learning into WSVAD. In this paper, we propose Global and Local Context-driven Federated Learning, a new paradigm for privacy protected weakly supervised video anomaly detection. Specifically, we utilize the vision-language association of CLIP to detect whether the video frame is abnormal. Instead of leveraging handcrafted text prompts for CLIP, we propose a text prompt generator. The generated prompt is simultaneously influenced by text and visual. On the one hand, the text provides global context related to anomaly, which improves the model's ability of generalization. On the other hand, the visual provides personalized local context because different clients may have videos with different types of anomalies or scenes. The generated prompt ensures global generalization while processing personalized data from different clients. Extensive experiments show that the proposed method achieves remarkable performance.

Federated Weakly Supervised Video Anomaly Detection with Multimodal Prompt

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information