Image Caption with Global-Local Attention

Authors

  • Linghui Li Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
  • Sheng Tang Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
  • Lixi Deng Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
  • Yongdong Zhang Key Lab of Intelligent Information Processing of Chinese Academy of Sciences
  • Qi Tian University of Texas at San Antonio

DOI:

https://doi.org/10.1609/aaai.v31i1.11236

Keywords:

CNN, RNN, image description

Abstract

Image caption is becoming important in the field of artificial intelligence. Most existing methods based on CNN-RNN framework suffer from the problems of object missing and misprediction due to the mere use of global representation at image-level. To address these problems, in this paper, we propose a global-local attention (GLA) method by integrating local representation at object-level with global representation at image-level through attention mechanism. Thus, our proposed method can pay more attention to how to predict the salient objects more precisely with high recall while keeping context information at image-level cocurrently. Therefore, our proposed GLA method can generate more relevant sentences, and achieve the state-of-the-art performance on the well-known Microsoft COCO caption dataset with several popular metrics.

Downloads

Published

2017-02-12

How to Cite

Li, L., Tang, S., Deng, L., Zhang, Y., & Tian, Q. (2017). Image Caption with Global-Local Attention. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11236