Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

Yingying Zhang; Junyu Gao; Xiaoshan Yang; Chang Liu; Yan Li; Changsheng Xu

doi:10.1609/aaai.v34i07.6988

Authors

Yingying Zhang CASIA, University of Chinese Academy of Sciences
Junyu Gao CASIA,University of Chinese Academy of Sciences, Peng Cheng Laboratory
Xiaoshan Yang CASIA, Peng Cheng Laboratory
Chang Liu Kuaishou Technology
Yan Li Kuaishou Technology
Changsheng Xu CASIA, University of Chinese Academy of Sciences, Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v34i07.6988

Abstract

With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information