Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection

Authors

  • Tiancai Wang Megvii Technology
  • Tong Yang Megvii Technology
  • Jiale Cao Tianjin University
  • Xiangyu Zhang Megvii Technology

DOI:

https://doi.org/10.1609/aaai.v35i4.16385

Keywords:

Applications, Object Detection & Categorization, Scene Analysis & Understanding

Abstract

Object detectors usually achieve promising results with the supervision of complete instance annotations. However, their performance is far from satisfactory with sparse instance annotations. Most existing methods for sparsely annotated object detection either re-weight the loss of hard negative samples or convert the unlabeled instances into ignored regions to reduce the interference of false negatives. We argue that these strategies are insufficient since they can at most alleviate the negative effect caused by missing annotations. In this paper, we propose a simple but effective mechanism, called Co-mining, for sparsely annotated object detection. In our Co-mining, two branches of a siamese network predict the pseudo-label sets for each other. To enhance multi-view learning and better mine unlabeled instances, the original image and corresponding augmented image are used as the inputs of two branches of the siamese network, respectively. Co-mining can serve as a general training mechanism applied to most of modern object detectors. Experiments are performed on MS COCO dataset with three different sparsely annotated settings using two typical frameworks: anchor-based detector RetinaNet and anchor-free detector FCOS. Experimental results show that our Co-mining with RetinaNet achieves 1.4%∼2.1% improvements compared with different baselines and surpasses existing methods under the same sparsely annotated setting.

Downloads

Published

2021-05-18

How to Cite

Wang, T., Yang, T., Cao, J., & Zhang, X. (2021). Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 2800-2808. https://doi.org/10.1609/aaai.v35i4.16385

Issue

Section

AAAI Technical Track on Computer Vision III