OctOcc: High-Resolution 3D Occupancy Prediction with Octree
DOI:
https://doi.org/10.1609/aaai.v38i5.28234Keywords:
CV: Vision for Robotics & Autonomous Driving, CV: SegmentationAbstract
3D semantic occupancy has garnered considerable attention due to its abundant structural information encompassing the entire scene in autonomous driving. However, existing 3D occupancy prediction methods contend with the constraint of low-resolution 3D voxel features arising from the limitation of computational memory. To address this limitation and achieve a more fine-grained representation of 3D scenes, we propose OctOcc, a novel octree-based approach for 3D semantic occupancy prediction. OctOcc is conceptually rooted in the observation that the vast majority of 3D space is left unoccupied. Capitalizing on this insight, we endeavor to cultivate memory-efficient high-resolution 3D occupancy predictions by mitigating superfluous cross-attentions. Specifically, we devise a hierarchical octree structure that selectively generates finer-grained cross-attentions solely in potentially occupied regions. Extending our inquiry beyond 3D space, we identify analogous redundancies within another side of cross attentions, 2D images. Consequently, a 2D image feature filtering network is conceived to expunge extraneous regions. Experimental results demonstrate that the proposed OctOcc significantly outperforms existing methods on nuScenes and SemanticKITTI datasets with limited memory consumption.Downloads
Published
2024-03-24
How to Cite
Ouyang, W., Song, X., Feng, B., & Xu, Z. (2024). OctOcc: High-Resolution 3D Occupancy Prediction with Octree. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4369-4377. https://doi.org/10.1609/aaai.v38i5.28234
Issue
Section
AAAI Technical Track on Computer Vision IV