Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

Authors

  • Mohamed Elhoseiny Rutgers University
  • Jingen Liu SRI International
  • Hui Cheng SRI International
  • Harpreet Sawhney SRI International
  • Ahmed Elgammal Rutgers University

DOI:

https://doi.org/10.1609/aaai.v30i1.10458

Keywords:

Language & Vision, Event Detection Zero Shot Detection, Action Recognition

Abstract

We propose a new zero-shot Event-Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) semantic embedding of concepts definitions, and (c) retrieve videos by free text event query (e.g., "changing a vehicle tire") based on their content. We first embed the video into the multi-modal semantic space and then measure the similarity between videos with the event query in free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-the-art that uses big descriptions from 12.6\% to 13.5\% with MAP metric and from 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.

Downloads

Published

2016-03-05

How to Cite

Elhoseiny, M., Liu, J., Cheng, H., Sawhney, H., & Elgammal, A. (2016). Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10458