Zero-Shot Object Detection with Textual Descriptions

Zhihui Li; Lina Yao; Xiaoqin Zhang; Xianzhi Wang; Salil Kanhere; Huaxiang Zhang

doi:10.1609/aaai.v33i01.33018690

Authors

Zhihui Li University of New South Wales
Lina Yao University of New South Wales
Xiaoqin Zhang Wenzhou University
Xianzhi Wang University of New South Wales
Salil Kanhere University of New South Wales
Huaxiang Zhang Shandong Normal University

DOI:

https://doi.org/10.1609/aaai.v33i01.33018690

Abstract

Object detection is important in real-world applications. Existing methods mainly focus on object detection with sufficient labelled training data or zero-shot object detection with only concept names. In this paper, we address the challenging problem of zero-shot object detection with natural language description, which aims to simultaneously detect and recognize novel concept instances with textual descriptions. We propose a novel deep learning framework to jointly learn visual units, visual-unit attention and word-level attention, which are combined to achieve word-proposal affinity by an element-wise multiplication. To the best of our knowledge, this is the first work on zero-shot object detection with textual descriptions. Since there is no directly related work in the literature, we investigate plausible solutions based on existing zero-shot object detection for a fair comparison. We conduct extensive experiments on three challenging benchmark datasets. The extensive experimental results confirm the superiority of the proposed model.

Zero-Shot Object Detection with Textual Descriptions

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription