Memory-Augmented Image Captioning

Authors

  • Zhengcong Fei Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100049, China

Keywords:

Language and Vision, Multimodal Learning

Abstract

Current deep learning-based image captioning systems have been proven to store practical knowledge with their parameters and achieve competitive performances in the public datasets. Nevertheless, their ability to access and precisely manipulate the mastered knowledge is still limited. Besides, providing evidence for decisions and updating memory information are also important yet under explored. Towards this goal, we introduce a memory-augmented method, which extends an existing image caption model by incorporating extra explicit knowledge from a memory bank. Adequate knowledge is recalled according to the similarity distance in the embedding space of history context, and the memory bank can be constructed conveniently from any matched image-text set, e.g., the previous training data. Incorporating such non-parametric memory-augmented method to various captioning baselines, the performance of resulting captioners imporves consistently on the evaluation benchmark. More encouragingly, extensive experiments demonstrate that our approach holds the capability for efficiently adapting to larger training datasets, by simply transferring the memory bank without any additional training.

Downloads

Published

2021-05-18

How to Cite

Fei, Z. (2021). Memory-Augmented Image Captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1317-1324. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16220

Issue

Section

AAAI Technical Track on Computer Vision I