Image Caption with Global-Local Attention

Linghui Li; Sheng Tang; Lixi Deng; Yongdong Zhang; Qi Tian

doi:10.1609/aaai.v31i1.11236

Authors

Linghui Li Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
Sheng Tang Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
Lixi Deng Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
Yongdong Zhang Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
Qi Tian University of Texas at San Antonio

DOI:

https://doi.org/10.1609/aaai.v31i1.11236

Keywords:

CNN, RNN, image description

Abstract

Image caption is becoming important in the field of artificial intelligence. Most existing methods based on CNN-RNN framework suffer from the problems of object missing and misprediction due to the mere use of global representation at image-level. To address these problems, in this paper, we propose a global-local attention (GLA) method by integrating local representation at object-level with global representation at image-level through attention mechanism. Thus, our proposed method can pay more attention to how to predict the salient objects more precisely with high recall while keeping context information at image-level cocurrently. Therefore, our proposed GLA method can generate more relevant sentences, and achieve the state-of-the-art performance on the well-known Microsoft COCO caption dataset with several popular metrics.

Image Caption with Global-Local Attention

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information