SPSC: Sparse and Scalable Multi-Modal 3D Occupancy Prediction for Autonomous Driving

Authors

  • Qingju Guo Beijing Institute of Technology
  • Shuang Li Beihang University
  • Binhui Xie Beijing Institute of Technology
  • Jing Geng Beijing Institute of Technology
  • Wei Li Nanjing University

DOI:

https://doi.org/10.1609/aaai.v40i6.42441

Abstract

3D semantic occupancy prediction offers a nuanced representation of the surrounding environment, which is crucial for ensuring the safety of autonomous driving. However, fine-grained scene representations inevitably result in cubic growth in data scale, which imposes substantial demands on model architecture and computational complexity, especially in high-resolution scenarios. Existing approaches for handling high-resolution scenes typically obtain fine-grained features by grid sampling on low-resolution feature map, resulting in limited sparsity and insufficient feature interaction. This paper presents a framework leveraging SParse representation and SCalable feature interaction to address the aforementioned challenges, called SPSC. Specifically, we maintain sparsity by progressively pruning unoccupied queries during the coarse-to-fine process, thereby reducing the scale of data that the model needs to handle. Subsequently, we introduce query serialization, which transforms queries into an ordered sequence while preserving their spatial structure, This enables fine-grained feature interaction while maintaining linear computational complexity and a larger receptive field. Without complex architectural designs, SPSC significantly outperforms SOTA approaches, relatively enhances the mIoU by 12.0%, 11.0% and 4.8% on nuScenes-Occupancy dataset under the muli-modal, LiDAR and camera settings, respectively.

Downloads

Published

2026-03-14

How to Cite

Guo, Q., Li, S., Xie, B., Geng, J., & Li, W. (2026). SPSC: Sparse and Scalable Multi-Modal 3D Occupancy Prediction for Autonomous Driving. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4430–4438. https://doi.org/10.1609/aaai.v40i6.42441

Issue

Section

AAAI Technical Track on Computer Vision III