From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning
DOI:
https://doi.org/10.1609/aaai.v40i8.37569Abstract
LiDAR-based 3D object detection models often struggle to generalize to real-world environments due to limited object diversity in existing datasets. To tackle it, we introduce the first generalized cross-domain few-shot (GCFS) task in 3D object detection, aiming to adapt a source-pretrained model to both common and novel classes in a new domain with only few-shot annotations. We propose a unified framework that learns stable target semantics under limited supervision by bridging 2D open-set semantics with 3D spatial reasoning. Specifically, an image-guided multi-modal fusion injects transferable 2D semantic cues into the 3D pipeline via vision-language models, while a physically-aware box search enhances 2D-to-3D alignment via LiDAR priors. To capture class-specific semantics from sparse data, we further introduce contrastive-enhanced prototype learning, which encodes few-shot instances into discriminative semantic anchors and stabilizes representation learning. Extensive experiments on GCFS benchmarks demonstrate the effectiveness and generality of our approach in realistic deployment settings.Downloads
Published
2026-03-14
How to Cite
Li, S., Shen, J., Ma, L., & Li, X. (2026). From Dataset to Real-world: General 3D Object Detection via Generalized Cross-domain Few-shot Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6415–6423. https://doi.org/10.1609/aaai.v40i8.37569
Issue
Section
AAAI Technical Track on Computer Vision V