Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

Authors

  • Wentao Qu Nanjing University of Science and Technology, China
  • Guofeng Mei Fondazione Bruno Kessler, Italy
  • Jing Wang HiDream.ai, China
  • Yujiao Wu Commonwealth Scientific and Industrial Research Organisation, Australia
  • Xiaoshui Huang Shanghai Jiaotong University, China
  • Liang Xiao Nanjing University of Science and Technology, China

DOI:

https://doi.org/10.1609/aaai.v40i11.37819

Abstract

Denoising Diffusion Probabilistic Models (DDPMs) have shown success in robust 3D object detection tasks. Existing methods often rely on the score matching from 3D boxes or pre-trained diffusion priors. However, they typically require multi-step iterations in inference, which limits efficiency. To address this, we propose a Robust single-stage fully Sparse 3D object Detection Network with a Detachable Latent Framework (DLF) of DDPMs, named RSDNet. Specifically, RSDNet learns the denoising process in latent feature spaces through lightweight denoising networks like multi-level denoising autoencoders (DAEs). This enables RSDNet to effectively understand scene distributions under multi-level perturbations, achieving robust and reliable detection. Meanwhile, we reformulate the noising and denoising mechanisms of DDPMs, enabling DLF to construct multi-type and multi-level noise samples and targets, enhancing RSDNet robustness to multiple perturbations. Furthermore, a semantic-geometric conditional guidance is introduced to perceive the object boundaries and shapes, alleviating the center feature missing problem in sparse representations, enabling RSDNet to perform in a fully sparse detection pipeline. Moreover, the detachable denoising network design of DLF enables RSDNet to perform single-step detection in inference, further enhancing detection efficiency. Extensive experiments on public benchmarks show that RSDNet can outperform existing methods, achieving state-of-the-art detection.

Downloads

Published

2026-03-14

How to Cite

Qu, W., Mei, G., Wang, J., Wu, Y., Huang, X., & Xiao, L. (2026). Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 8668–8676. https://doi.org/10.1609/aaai.v40i11.37819

Issue

Section

AAAI Technical Track on Computer Vision VIII