Weakly Supervised Few-Shot Object Detection with DETR

Chenbo Zhang; Yinglu Zhang; Lu Zhang; Jiajia Zhao; Jihong Guan; Shuigeng Zhou

doi:10.1609/aaai.v38i7.28528

Authors

Chenbo Zhang Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China
Yinglu Zhang Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China
Lu Zhang Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China
Jiajia Zhao Science and Technology on Complex System Control and Intelligent Agent Cooperation Laboratory, Beijing Electro-Mechanical Engineering Institute, China
Jihong Guan Department of Computer Science & Technology, Tongji University, China
Shuigeng Zhou Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, China

DOI:

https://doi.org/10.1609/aaai.v38i7.28528

Keywords:

CV: Object Detection & Categorization, CV: Applications

Abstract

In recent years, Few-shot Object Detection (FSOD) has become an increasingly important research topic in computer vision. However, existing FSOD methods require strong annotations including category labels and bounding boxes, and their performance is heavily dependent on the quality of box annotations. However, acquiring strong annotations is both expensive and time-consuming. This inspires the study on weakly supervised FSOD (WS-FSOD in short), which realizes FSOD with only image-level annotations, i.e., category labels. In this paper, we propose a new and effective weakly supervised FSOD method named WFS-DETR. By a well-designed pretraining process, WFS-DETR first acquires general object localization and integrity judgment capabilities on large-scale pretraining data. Then, it introduces object integrity into multiple-instance learning to solve the common local optimum problem by comprehensively exploiting both semantic and visual information. Finally, with simple fine-tuning, it transfers the knowledge learned from the base classes to the novel classes, which enables accurate detection of novel objects. Benefiting from this ``pretraining-refinement'' mechanism, WSF-DETR can achieve good generalization on different datasets. Extensive experiments also show that the proposed method clearly outperforms the existing counterparts in the WS-FSOD task.

Weakly Supervised Few-Shot Object Detection with DETR

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription