Less Is Better: Sparse Instance Learning for Cross-Domain Few-Shot Object Detection

Authors

  • Yali Huang School of Computer and Artificial Intelligence, Zhengzhou University, China
  • Jie Mei Dongfeng Commercial Vehicle Co.,Ltd, China
  • Ziyi Wu School of Computer and Artificial Intelligence, Zhengzhou University, China
  • Yiming Yang School of Computer and Artificial Intelligence, Zhengzhou University, China
  • Hongru Zhao School of Computer and Artificial Intelligence, Zhengzhou University, China Engineering Research Center of Intelligent Swarm Systems, Ministry of Education, China National SuperComputing Center in Zhengzhou, Zhengzhou, China
  • Mingyuan Jiu School of Computer and Artificial Intelligence, Zhengzhou University, China Engineering Research Center of Intelligent Swarm Systems, Ministry of Education, China National SuperComputing Center in Zhengzhou, Zhengzhou, China
  • Hichem Sahbi Sorbonne University, CNRS, LIP6, F-75005, Paris, France

DOI:

https://doi.org/10.1609/aaai.v40i7.37432

Abstract

Cross-Domain Few-Shot Object Detection (CD-FSOD) is an extremely challenging task due to the inherent data scarcity and substantial domain shift between the source and target domains. Existing methods often suffer from overfitting and noisy feature representations, which hinder the construction of discriminative class prototypes in the target domain. In this paper, we propose a novel framework with sparse instance learning (SI-ViTO) for CD-FSOD, which leverages instance sparsity to achieve a better detection with less representation. SI-ViTO adopts a dual-stage sparsity module, consisting of instance feature sparsity not only on the few-shot support images but also on the query images. This dual sparsity enables the model to effectively preserve salient foreground semantics and simultaneously to filter out redundant or noisy information. Furthermore, a new prototype calibration strategy is also used to dynamically refine the class prototypes with query instances to accelerate prototype adaptation. Extensive experimental results on CD-FSOD benchmarks show that SI-ViTO outperforms the state-of-the-art methods, demonstrating that less discriminative representations yield better cross-domain few-shot object detection performance than more abundant ones.

Downloads

Published

2026-03-14

How to Cite

Huang, Y., Mei, J., Wu, Z., Yang, Y., Zhao, H., Jiu, M., & Sahbi, H. (2026). Less Is Better: Sparse Instance Learning for Cross-Domain Few-Shot Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5176–5184. https://doi.org/10.1609/aaai.v40i7.37432

Issue

Section

AAAI Technical Track on Computer Vision IV