FD3D: Exploiting Foreground Depth Map for Feature-Supervised Monocular 3D Object Detection

Authors

  • Zizhang Wu Fudan University
  • Yuanzhu Gan ZongmuTech
  • Yunzhe Wu ZongmuTech
  • Ruihao Wang ZongmuTech
  • Xiaoquan Wang ExploAI
  • Jian Pu Fudan University

DOI:

https://doi.org/10.1609/aaai.v38i6.28436

Keywords:

CV: 3D Computer Vision, CV: Object Detection & Categorization

Abstract

Monocular 3D object detection usually adopts direct or hierarchical label supervision. Recently, the distillation supervision transfers the spatial knowledge from LiDAR- or stereo-based teacher networks to monocular detectors, but remaining the domain gap. To mitigate this issue and pursue adequate label manipulation, we exploit Foreground Depth map for feature-supervised monocular 3D object detection named FD3D, which develops the high-quality instructive intermediate features to conduct desirable auxiliary feature supervision with only the original image and annotation foreground object-wise depth map (AFOD) as input. Furthermore, we build up our instructive feature generation network to create instructive spatial features based on the sufficient correlation between image features and pre-processed AFOD, where AFOD provides the attention focus only on foreground objects to achieve clearer guidance in the detection task. Moreover, we apply the auxiliary feature supervision from the pixel and distribution level to achieve comprehensive spatial knowledge guidance. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both the KITTI and nuScenes datasets, with no external data and no extra inference computational cost. We also conduct quantitative and qualitative studies to reveal the effectiveness of our designs.

Published

2024-03-24

How to Cite

Wu, Z., Gan, Y., Wu, Y., Wang, R., Wang, X., & Pu, J. (2024). FD3D: Exploiting Foreground Depth Map for Feature-Supervised Monocular 3D Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6189–6197. https://doi.org/10.1609/aaai.v38i6.28436

Issue

Section

AAAI Technical Track on Computer Vision V