Weakly Supervised Few-Shot Object Detection with DETR

Authors

  • Chenbo Zhang Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China
  • Yinglu Zhang Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China
  • Lu Zhang Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China
  • Jiajia Zhao Science and Technology on Complex System Control and Intelligent Agent Cooperation Laboratory, Beijing Electro-Mechanical Engineering Institute, China
  • Jihong Guan Department of Computer Science & Technology, Tongji University, China
  • Shuigeng Zhou Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28528

Keywords:

CV: Object Detection & Categorization, CV: Applications

Abstract

In recent years, Few-shot Object Detection (FSOD) has become an increasingly important research topic in computer vision. However, existing FSOD methods require strong annotations including category labels and bounding boxes, and their performance is heavily dependent on the quality of box annotations. However, acquiring strong annotations is both expensive and time-consuming. This inspires the study on weakly supervised FSOD (WS-FSOD in short), which realizes FSOD with only image-level annotations, i.e., category labels. In this paper, we propose a new and effective weakly supervised FSOD method named WFS-DETR. By a well-designed pretraining process, WFS-DETR first acquires general object localization and integrity judgment capabilities on large-scale pretraining data. Then, it introduces object integrity into multiple-instance learning to solve the common local optimum problem by comprehensively exploiting both semantic and visual information. Finally, with simple fine-tuning, it transfers the knowledge learned from the base classes to the novel classes, which enables accurate detection of novel objects. Benefiting from this ``pretraining-refinement'' mechanism, WSF-DETR can achieve good generalization on different datasets. Extensive experiments also show that the proposed method clearly outperforms the existing counterparts in the WS-FSOD task.

Published

2024-03-24

How to Cite

Zhang, C., Zhang, Y., Zhang, L., Zhao, J., Guan, J., & Zhou, S. (2024). Weakly Supervised Few-Shot Object Detection with DETR. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7015-7023. https://doi.org/10.1609/aaai.v38i7.28528

Issue

Section

AAAI Technical Track on Computer Vision VI