SafeLens: Segment-Level Hate Speech Detection in Online Videos

Authors

  • Zhuoran Wang Singapore University of Technology and Design
  • Dylan Raharja Singapore University of Technology and Design
  • Yujia Hu Singapore University of Technology and Design
  • Roy Ka-Wei Lee Singapore University of Technology and Design

DOI:

https://doi.org/10.1609/aaai.v40i48.42390

Abstract

We present SafeLens, a lightweight segment-level video moderation system that fuses speech, text, and visual frames to produce hateful content detection for each segment. For every segment, SafeLens returns a structured prediction: label, prediction confidence, reasons for flag, harm categories. The structured predictions are optimized for triage, appeals, and downstream enforcement. The system is modular (pluggable speech, text, and visual processing modules back-ends and a mid-size policy Language Language Model (LLM) agent with parameter-efficient tuning). In the live demo, attendees can upload or select clips, scrub the timeline to flag hateful segments, inspect rationales, and vary the policy LLM agent to benchmark the hateful content moderation performance.

Published

2026-03-14

How to Cite

Wang, Z., Raharja, D., Hu, Y., & Lee, R. K.-W. (2026). SafeLens: Segment-Level Hate Speech Detection in Online Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41712–41714. https://doi.org/10.1609/aaai.v40i48.42390