FFNet: Frequency Fusion Network for Semantic Scene Completion

Xuzhi Wang; Di Lin; Liang Wan

doi:10.1609/aaai.v36i3.20156

Authors

Xuzhi Wang College of Intelligence and Computing, Tianjin University
Di Lin College of Intelligence and Computing, Tianjin University
Liang Wan College of Intelligence and Computing, Tianjin University

DOI:

https://doi.org/10.1609/aaai.v36i3.20156

Keywords:

Computer Vision (CV)

Abstract

Semantic scene completion (SSC) requires the estimation of the 3D geometric occupancies of objects in the scene, along with the object categories. Currently, many methods employ RGB-D images to capture the geometric and semantic information of objects. These methods use simple but popular spatial- and channel-wise operations, which fuse the information of RGB and depth data. Yet, they ignore the large discrepancy of RGB-D data and the uncertainty measurements of depth data. To solve this problem, we propose the Frequency Fusion Network (FFNet), a novel method for boosting semantic scene completion by better utilizing RGB-D data. FFNet explicitly correlates the RGB-D data in the frequency domain, different from the features directly extracted by the convolution operation. Then, the network uses the correlated information to guide the feature learning from the RG- B and depth images, respectively. Moreover, FFNet accounts for the properties of different frequency components of RGB- D features. It has a learnable elliptical mask to decompose the features learned from the RGB and depth images, attending to various frequencies to facilitate the correlation process of RGB-D data. We evaluate FFNet intensively on the public SSC benchmarks, where FFNet surpasses the state-of- the-art methods. The code package of FFNet is available at https://github.com/alanWXZ/FFNet.

FFNet: Frequency Fusion Network for Semantic Scene Completion

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription