Context-Transformer: Tackling Object Confusion for Few-Shot Detection
Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors. A popular approach to handle this problem is transfer learning, i.e., fine-tuning a detector pretrained on a source-domain benchmark. However, such transferred detector often fails to recognize new objects in the target domain, due to low data diversity of training samples. To tackle this problem, we propose a novel Context-Transformer within a concise deep transfer framework. Specifically, Context-Transformer can effectively leverage source-domain object knowledge as guidance, and automatically exploit contexts from only a few training images in the target domain. Subsequently, it can adaptively integrate these relational clues to enhance the discriminative power of detector, in order to reduce object confusion in few-shot scenarios. Moreover, Context-Transformer is flexibly embedded in the popular SSD-style detectors, which makes it a plug-and-play module for end-to-end few-shot learning. Finally, we evaluate Context-Transformer on the challenging settings of few-shot detection and incremental few-shot detection. The experimental results show that, our framework outperforms the recent state-of-the-art approaches.