SPSC: Sparse and Scalable Multi-Modal 3D Occupancy Prediction for Autonomous Driving

Qingju Guo; Shuang Li; Binhui Xie; Jing Geng; Wei Li

doi:10.1609/aaai.v40i6.42441

Authors

Qingju Guo Beijing Institute of Technology
Shuang Li Beihang University
Binhui Xie Beijing Institute of Technology
Jing Geng Beijing Institute of Technology
Wei Li Nanjing University

DOI:

https://doi.org/10.1609/aaai.v40i6.42441

Abstract

3D semantic occupancy prediction offers a nuanced representation of the surrounding environment, which is crucial for ensuring the safety of autonomous driving. However, fine-grained scene representations inevitably result in cubic growth in data scale, which imposes substantial demands on model architecture and computational complexity, especially in high-resolution scenarios. Existing approaches for handling high-resolution scenes typically obtain fine-grained features by grid sampling on low-resolution feature map, resulting in limited sparsity and insufficient feature interaction. This paper presents a framework leveraging SParse representation and SCalable feature interaction to address the aforementioned challenges, called SPSC. Specifically, we maintain sparsity by progressively pruning unoccupied queries during the coarse-to-fine process, thereby reducing the scale of data that the model needs to handle. Subsequently, we introduce query serialization, which transforms queries into an ordered sequence while preserving their spatial structure, This enables fine-grained feature interaction while maintaining linear computational complexity and a larger receptive field. Without complex architectural designs, SPSC significantly outperforms SOTA approaches, relatively enhances the mIoU by 12.0%, 11.0% and 4.8% on nuScenes-Occupancy dataset under the muli-modal, LiDAR and camera settings, respectively.

SPSC: Sparse and Scalable Multi-Modal 3D Occupancy Prediction for Autonomous Driving

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information