StarNet: towards Weakly Supervised Few-Shot Object Detection


  • Leonid Karlinsky IBM Research AI
  • Joseph Shtok IBM Research AI
  • Amit Alfassy IBM Research AI Technion
  • Moshe Lichtenstein IBM Research AI
  • Sivan Harary IBM Research AI
  • Eli Schwartz IBM Research AI Tel-Aviv University
  • Sivan Doveh IBM Research AI
  • Prasanna Sattigeri IBM Research AI
  • Rogerio Feris IBM Research AI
  • Alex Bronstein Technion
  • Raja Giryes Tel-Aviv University


Object Detection & Categorization


Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to find corresponding object instances). Being a few-shot detector, StarNet does not require any bounding box annotations, neither during pre-training nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains significant improvements over the baselines. In addition, StarNet shows significant gains on few-shot classification benchmarks that are less cropped around the objects (where object localization is key).




How to Cite

Karlinsky, L., Shtok, J., Alfassy, A., Lichtenstein, M., Harary, S., Schwartz, E., Doveh, S., Sattigeri, P., Feris, R., Bronstein, A., & Giryes, R. (2021). StarNet: towards Weakly Supervised Few-Shot Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1743-1753. Retrieved from



AAAI Technical Track on Computer Vision I