Image Captioning with Context-Aware Auxiliary Guidance

Authors

  • Zeliang Song Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
  • Xiaofei Zhou Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
  • Zhendong Mao University of Science and Technology of China, Hefei, China
  • Jianlong Tan Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v35i3.16361

Keywords:

Language and Vision

Abstract

Image captioning is a challenging computer vision task, which aims to generate a natural language description of an image. Most recent researches follow the encoder-decoder framework which depends heavily on the previous generated words for the current prediction. Such methods can not effectively take advantage of the future predicted information to learn complete semantics. In this paper, we propose Context-Aware Auxiliary Guidance (CAAG) mechanism that can guide the captioning model to perceive global contexts. Upon the captioning model, CAAG performs semantic attention that selectively concentrates on useful information of the global predictions to reproduce the current generation. To validate the adaptability of the method, we apply CAAG to three popular captioners and our proposal achieves competitive performance on the challenging Microsoft COCO image captioning benchmark, e.g. 132.2 CIDEr-D score on Karpathy split and 130.7 CIDEr-D (c40) score on official online evaluation server.

Downloads

Published

2021-05-18

How to Cite

Song, Z., Zhou, X., Mao, Z., & Tan, J. (2021). Image Captioning with Context-Aware Auxiliary Guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 2584-2592. https://doi.org/10.1609/aaai.v35i3.16361

Issue

Section

AAAI Technical Track on Computer Vision II