A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

Chenglei  Wu; Ruixiao  Zhang; Zhi  Wang; Lifeng Sun

doi:10.1609/aaai.v34i01.7377

Authors

Chenglei Wu Tsinghua University
Ruixiao Zhang Tsinghua University
Zhi Wang Tsinghua University
Lifeng Sun Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v34i01.7377

Abstract

Viewport prediction for 360 video forecasts a viewer’s viewport when he/she watches a 360 video with a head-mounted display, which benefits many VR/AR applications such as 360 video streaming and mobile cloud VR. Existing studies based on planar convolutional neural network (CNN) suffer from the image distortion and split caused by the sphere-to-plane projection. In this paper, we start by proposing a spherical convolution based feature extraction network to distill spatial-temporal 360 information. We provide a solution for training such a network without a dedicated 360 image or video classification dataset. We differ with previous methods, which base their predictions on image pixel-level information, and propose a semantic content and preference based viewport prediction scheme. In this paper, we adopt a recurrent neural network (RNN) network to extract a user's personal preference of 360 video content from minutes of embedded viewing histories. We utilize this semantic preference as spatial attention to help network find the "interested'' regions on a future video. We further design a tailored mixture density network (MDN) based viewport prediction scheme, including viewport modeling, tailored loss function, etc, to improve efficiency and accuracy. Our extensive experiments demonstrate the rationality and performance of our method, which outperforms state-of-the-art methods, especially in long-term prediction.

A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription