Zero-Shot Object Detection with Textual Descriptions


  • Zhihui Li University of New South Wales
  • Lina Yao University of New South Wales
  • Xiaoqin Zhang Wenzhou University
  • Xianzhi Wang University of New South Wales
  • Salil Kanhere University of New South Wales
  • Huaxiang Zhang Shandong Normal University



Object detection is important in real-world applications. Existing methods mainly focus on object detection with sufficient labelled training data or zero-shot object detection with only concept names. In this paper, we address the challenging problem of zero-shot object detection with natural language description, which aims to simultaneously detect and recognize novel concept instances with textual descriptions. We propose a novel deep learning framework to jointly learn visual units, visual-unit attention and word-level attention, which are combined to achieve word-proposal affinity by an element-wise multiplication. To the best of our knowledge, this is the first work on zero-shot object detection with textual descriptions. Since there is no directly related work in the literature, we investigate plausible solutions based on existing zero-shot object detection for a fair comparison. We conduct extensive experiments on three challenging benchmark datasets. The extensive experimental results confirm the superiority of the proposed model.




How to Cite

Li, Z., Yao, L., Zhang, X., Wang, X., Kanhere, S., & Zhang, H. (2019). Zero-Shot Object Detection with Textual Descriptions. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8690-8697.



AAAI Technical Track: Vision