Zero-Shot Object Detection with Textual Descriptions

Authors

  • Zhihui Li University of New South Wales
  • Lina Yao University of New South Wales
  • Xiaoqin Zhang Wenzhou University
  • Xianzhi Wang University of New South Wales
  • Salil Kanhere University of New South Wales
  • Huaxiang Zhang Shandong Normal University

DOI:

https://doi.org/10.1609/aaai.v33i01.33018690

Abstract

Object detection is important in real-world applications. Existing methods mainly focus on object detection with sufficient labelled training data or zero-shot object detection with only concept names. In this paper, we address the challenging problem of zero-shot object detection with natural language description, which aims to simultaneously detect and recognize novel concept instances with textual descriptions. We propose a novel deep learning framework to jointly learn visual units, visual-unit attention and word-level attention, which are combined to achieve word-proposal affinity by an element-wise multiplication. To the best of our knowledge, this is the first work on zero-shot object detection with textual descriptions. Since there is no directly related work in the literature, we investigate plausible solutions based on existing zero-shot object detection for a fair comparison. We conduct extensive experiments on three challenging benchmark datasets. The extensive experimental results confirm the superiority of the proposed model.

Downloads

Published

2019-07-17

How to Cite

Li, Z., Yao, L., Zhang, X., Wang, X., Kanhere, S., & Zhang, H. (2019). Zero-Shot Object Detection with Textual Descriptions. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8690-8697. https://doi.org/10.1609/aaai.v33i01.33018690

Issue

Section

AAAI Technical Track: Vision