SPSC: Sparse and Scalable Multi-Modal 3D Occupancy Prediction for Autonomous Driving
DOI:
https://doi.org/10.1609/aaai.v40i6.42441Abstract
3D semantic occupancy prediction offers a nuanced representation of the surrounding environment, which is crucial for ensuring the safety of autonomous driving. However, fine-grained scene representations inevitably result in cubic growth in data scale, which imposes substantial demands on model architecture and computational complexity, especially in high-resolution scenarios. Existing approaches for handling high-resolution scenes typically obtain fine-grained features by grid sampling on low-resolution feature map, resulting in limited sparsity and insufficient feature interaction. This paper presents a framework leveraging SParse representation and SCalable feature interaction to address the aforementioned challenges, called SPSC. Specifically, we maintain sparsity by progressively pruning unoccupied queries during the coarse-to-fine process, thereby reducing the scale of data that the model needs to handle. Subsequently, we introduce query serialization, which transforms queries into an ordered sequence while preserving their spatial structure, This enables fine-grained feature interaction while maintaining linear computational complexity and a larger receptive field. Without complex architectural designs, SPSC significantly outperforms SOTA approaches, relatively enhances the mIoU by 12.0%, 11.0% and 4.8% on nuScenes-Occupancy dataset under the muli-modal, LiDAR and camera settings, respectively.Downloads
Published
2026-03-14
How to Cite
Guo, Q., Li, S., Xie, B., Geng, J., & Li, W. (2026). SPSC: Sparse and Scalable Multi-Modal 3D Occupancy Prediction for Autonomous Driving. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4430–4438. https://doi.org/10.1609/aaai.v40i6.42441
Issue
Section
AAAI Technical Track on Computer Vision III