A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video

Authors

  • Youngjae Yu Seoul National University Vision and Learning Lab
  • Sangho Lee Seoul National University Vision and Learning Lab
  • Joonil Na Seoul National University Vision and Learning Lab
  • Jaeyun Kang Seoul National University
  • Gunhee Kim Seoul National University Vision and Learning Lab

DOI:

https://doi.org/10.1609/aaai.v32i1.12335

Keywords:

Automatic summarization of 360 degree videos , Deep pairwise ranking models , Weakly supervised large-scale web video training

Abstract

We address the problem of highlight detection from a 360◦ video by summarizing it both spatially and temporally. Given a long 360◦ video, we spatially select pleasantly-looking normal field-of-view (NFOV) segments from unlimited field of views (FOV) of the 360◦ video, and temporally summarize it into a concise and informative highlight as a selected subset of subshots. We propose a novel deep ranking model named as Composition View Score (CVS) model, which produces a spherical score map of composition per video segment, and determines which view is suitable for highlight via a sliding window kernel at inference. To evaluate the proposed framework, we perform experiments on the Pano2Vid benchmark dataset (Su, Jayaraman, and Grauman 2016) and our newly collected 360◦ video highlight dataset from YouTube and Vimeo. Through evaluation using both quantitative summarization metrics and user studies via Amazon Mechanical Turk, we demonstrate that our approach outperforms several state-of-the-art highlight detection methods.We also show that our model is 16 times faster at inference than AutoCam (Su, Jayaraman, and Grauman 2016), which is one of the first summarization algorithms of 360◦ videos.

Downloads

Published

2018-04-27

How to Cite

Yu, Y., Lee, S., Na, J., Kang, J., & Kim, G. (2018). A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.12335