One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception

Authors

  • Bohan Li Shanghai Jiao Tong University, Shanghai, China Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
  • Yasheng Sun Tokyo Institute of Technology, Tokyo, Japan
  • Jingxin Dong Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
  • Zheng Zhu PhiGent Robotics, Beijing, China
  • Jinming Liu Shanghai Jiao Tong University, Shanghai, China Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
  • Xin Jin Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
  • Wenjun Zeng Shanghai Jiao Tong University, Shanghai, China Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China

DOI:

https://doi.org/10.1609/aaai.v38i4.28085

Keywords:

CV: 3D Computer Vision, CV: Scene Analysis & Understanding, CV: Vision for Robotics & Autonomous Driving

Abstract

Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC). They typically construct 3D probability volumes directly with geometric correspondence, attempting to fully address the scene perception tasks in a single forward pass. However, such a single-step solution makes it hard to learn accurate and convincing volumetric probability, especially in challenging regions like unexpected occlusions and complicated light reflections. Therefore, this paper proposes to decompose the complicated 3D volume representation learning into a sequence of generative steps to facilitate fine and reliable scene perception. Considering the recent advances achieved by strong generative diffusion models, we introduce a multi-step learning framework, dubbed as VPD, dedicated to progressively refining the Volumetric Probability in a Diffusion process. Specifically, we first build a coarse probability volume from input images with the off-the-shelf scene perception baselines, which is then conditioned as the basic geometry prior before being fed into a 3D diffusion UNet, to progressively achieve accurate probability distribution modeling. To handle the corner cases in challenging areas, a Confidence-Aware Contextual Collaboration (CACC) module is developed to correct the uncertain regions for reliable volumetric learning based on multi-scale contextual contents. Moreover, an Online Filtering (OF) strategy is designed to maintain representation consistency for stable diffusion sampling. Extensive experiments are conducted on scene perception tasks including multi-view stereo (MVS) and semantic scene completion (SSC), to validate the efficacy of our method in learning reliable volumetric representations. Notably, for the SSC task, our work stands out as the first to surpass LiDAR-based methods on the SemanticKITTI dataset.

Published

2024-03-24

How to Cite

Li, B., Sun, Y., Dong, J., Zhu, Z., Liu, J., Jin, X., & Zeng, W. (2024). One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception. Proceedings of the AAAI Conference on Artificial Intelligence, 38(4), 3028-3036. https://doi.org/10.1609/aaai.v38i4.28085

Issue

Section

AAAI Technical Track on Computer Vision III