SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention

Authors

  • Pi-Wei Chen Silesian University of Technology
  • Jerry Chun-Wei Lin BakuAI
  • Barış Fahri Kahrıman Silesian University of Technology
  • Zih-Ching Chen Nvidia
  • Rafał Cupek Silesian University of Technology
  • Marek Drewniak Aiut

DOI:

https://doi.org/10.1609/aaai.v40i48.42339

Abstract

Event detection is essential for surveillance, particularly in retail loss prevention, where accurate and timely monitoring is critical. Vision Language Models (VLMs) provide strong generalization but are inefficient at processing full video streams and are prone to hallucinations induced by redundant frames. We present SmartEyes, a plug-and-play system for real-time retail surveillance. SmartEyes introduces the Perception Cognition Focusing (PCF) framework, which combines lightweight perception with semantic triggering to isolate two keyframes (customer contact and departure) and constrains the VLMs to a focused differencing task. This design reduces hallucination by 44% compared to vanilla VLMs. From the demonstrated retail application, the proposed perception-to-reasoning pipeline is general and directly extends to industrial environments that require reliable event detection and real-time decision-making. Our demo includes a user-friendly Region of Interest (ROI) selection interface and live CCTV monitoring, producing accurate alerts within 1–2 seconds on a single RTX 4080 GPU. This lightweight framework design enables efficient deployment to broader industrial applications.

Published

2026-03-14

How to Cite

Chen, P.-W., Lin, J. C.-W., Kahrıman, B. F., Chen, Z.-C., Cupek, R., & Drewniak, M. (2026). SmartEyes: Plug-and-Play Event Detection for Retail Loss Prevention. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41559–41561. https://doi.org/10.1609/aaai.v40i48.42339