From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning

Shuangzhi Li; Junlong Shen; Lei Ma; Xingyu Li

doi:10.1609/aaai.v40i8.37569

Authors

Shuangzhi Li University of Alberta
Junlong Shen University of Alberta
Lei Ma The University of Tokyo University of Alberta
Xingyu Li University of Alberta

DOI:

https://doi.org/10.1609/aaai.v40i8.37569

Abstract

LiDAR-based 3D object detection models often struggle to generalize to real-world environments due to limited object diversity in existing datasets. To tackle it, we introduce the first generalized cross-domain few-shot (GCFS) task in 3D object detection, aiming to adapt a source-pretrained model to both common and novel classes in a new domain with only few-shot annotations. We propose a unified framework that learns stable target semantics under limited supervision by bridging 2D open-set semantics with 3D spatial reasoning. Specifically, an image-guided multi-modal fusion injects transferable 2D semantic cues into the 3D pipeline via vision-language models, while a physically-aware box search enhances 2D-to-3D alignment via LiDAR priors. To capture class-specific semantics from sparse data, we further introduce contrastive-enhanced prototype learning, which encodes few-shot instances into discriminative semantic anchors and stabilizes representation learning. Extensive experiments on GCFS benchmarks demonstrate the effectiveness and generality of our approach in realistic deployment settings.

From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information