Towards 3D Object-Centric Feature Learning for Semantic Scene Completion

Authors

  • Weihua Wang Faculty of Robot Science and Engineering, Northeastern University, Shenyang, Liaoning, China National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, Liaoning, China
  • Yubo Cui Faculty of Robot Science and Engineering, Northeastern University, Shenyang, Liaoning, China
  • Xiangru Lin The University of Hong Kong
  • Zhiheng Li Faculty of Robot Science and Engineering, Northeastern University, Shenyang, Liaoning, China
  • Zheng Fang Faculty of Robot Science and Engineering, Northeastern University, Shenyang, Liaoning, China National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, Liaoning, China The Key Laboratory of Data Analytics and Optimization for Smart Industry(Northeastern University), China

DOI:

https://doi.org/10.1609/aaai.v40i12.37981

Abstract

Vision-based 3D Semantic Scene Completion (SSC) has received growing attention due to its potential in autonomous driving. While most existing approaches follow an ego-centric paradigm by aggregating and diffusing features over the entire scene, they often overlook fine-grained object-level details, leading to semantic and geometric ambiguities, especially in complex environments. To address this limitation, we propose Ocean, an object-centric prediction framework that decomposes the scene into individual object instances to enable more accurate semantic occupancy prediction. Specifically, we first employ a lightweight segmentation model, MobileSAM, to extract instance masks from the input image. Then, we introduce a 3D Semantic Group Attention module that leverages linear attention to aggregate object-centric features in 3D space. To handle segmentation errors and missing instances, we further design a Global Similarity-Guided Attention module that leverages segmentation features for global interaction. Finally, we propose an Instance-aware Local Diffusion module that improves instance features through a generative process and subsequently refines the scene representation in the BEV space. Extensive experiments on the SemanticKITTI and SSCBench-KITTI360 benchmarks demonstrate that Ocean achieves state-of-the-art performance, with mIoU scores of 17.40 and 20.28, respectively.

Downloads

Published

2026-03-14

How to Cite

Wang, W., Cui, Y., Lin, X., Li, Z., & Fang, Z. (2026). Towards 3D Object-Centric Feature Learning for Semantic Scene Completion. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10136–10144. https://doi.org/10.1609/aaai.v40i12.37981

Issue

Section

AAAI Technical Track on Computer Vision IX