Text-Guided Attention Model for Image Captioning

Jonghwan Mun; Minsu Cho; Bohyung Han

doi:10.1609/aaai.v31i1.11237

Authors

Jonghwan Mun Pohang University of Science and Technology (POSTECH)
Minsu Cho Pohang University of Science and Technology (POSTECH)
Bohyung Han Pohang University of Science and Technology (POSTECH)

DOI:

https://doi.org/10.1609/aaai.v31i1.11237

Keywords:

Image Captioning, Attention Model

Abstract

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to drive visual attention using associated captions. For this model, we propose an exemplar-based learning approach that retrieves from training data associated captions with each image, and use them to learn attention on visual features. Our attention model enables to describe a detailed state of scenes by distinguishing small or confusable objects effectively. We validate our model on MS-COCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics.

Text-Guided Attention Model for Image Captioning

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription