FFNet: Frequency Fusion Network for Semantic Scene Completion

Authors

  • Xuzhi Wang College of Intelligence and Computing, Tianjin University
  • Di Lin College of Intelligence and Computing, Tianjin University
  • Liang Wan College of Intelligence and Computing, Tianjin University

DOI:

https://doi.org/10.1609/aaai.v36i3.20156

Keywords:

Computer Vision (CV)

Abstract

Semantic scene completion (SSC) requires the estimation of the 3D geometric occupancies of objects in the scene, along with the object categories. Currently, many methods employ RGB-D images to capture the geometric and semantic information of objects. These methods use simple but popular spatial- and channel-wise operations, which fuse the information of RGB and depth data. Yet, they ignore the large discrepancy of RGB-D data and the uncertainty measurements of depth data. To solve this problem, we propose the Frequency Fusion Network (FFNet), a novel method for boosting semantic scene completion by better utilizing RGB-D data. FFNet explicitly correlates the RGB-D data in the frequency domain, different from the features directly extracted by the convolution operation. Then, the network uses the correlated information to guide the feature learning from the RG- B and depth images, respectively. Moreover, FFNet accounts for the properties of different frequency components of RGB- D features. It has a learnable elliptical mask to decompose the features learned from the RGB and depth images, attending to various frequencies to facilitate the correlation process of RGB-D data. We evaluate FFNet intensively on the public SSC benchmarks, where FFNet surpasses the state-of- the-art methods. The code package of FFNet is available at https://github.com/alanWXZ/FFNet.

Downloads

Published

2022-06-28

How to Cite

Wang, X., Lin, D., & Wan, L. (2022). FFNet: Frequency Fusion Network for Semantic Scene Completion. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2550-2557. https://doi.org/10.1609/aaai.v36i3.20156

Issue

Section

AAAI Technical Track on Computer Vision III