Context-Transformer: Tackling Object Confusion for Few-Shot Detection

Authors

  • Ze Yang Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
  • Yali Wang Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
  • Xianyu Chen Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
  • Jianzhuang Liu Huawei Noah's Ark Lab
  • Yu Qiao Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v34i07.6957

Abstract

Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors. A popular approach to handle this problem is transfer learning, i.e., fine-tuning a detector pretrained on a source-domain benchmark. However, such transferred detector often fails to recognize new objects in the target domain, due to low data diversity of training samples. To tackle this problem, we propose a novel Context-Transformer within a concise deep transfer framework. Specifically, Context-Transformer can effectively leverage source-domain object knowledge as guidance, and automatically exploit contexts from only a few training images in the target domain. Subsequently, it can adaptively integrate these relational clues to enhance the discriminative power of detector, in order to reduce object confusion in few-shot scenarios. Moreover, Context-Transformer is flexibly embedded in the popular SSD-style detectors, which makes it a plug-and-play module for end-to-end few-shot learning. Finally, we evaluate Context-Transformer on the challenging settings of few-shot detection and incremental few-shot detection. The experimental results show that, our framework outperforms the recent state-of-the-art approaches.

Downloads

Published

2020-04-03

How to Cite

Yang, Z., Wang, Y., Chen, X., Liu, J., & Qiao, Y. (2020). Context-Transformer: Tackling Object Confusion for Few-Shot Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 12653-12660. https://doi.org/10.1609/aaai.v34i07.6957

Issue

Section

AAAI Technical Track: Vision