TY - JOUR AU - Wang, Li AU - Bai, Zechen AU - Zhang, Yonghua AU - Lu, Hongtao PY - 2020/04/03 Y2 - 2024/03/28 TI - Show, Recall, and Tell: Image Captioning with Recall Mechanism JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 07 SE - AAAI Technical Track: Vision DO - 10.1609/aaai.v34i07.6898 UR - https://ojs.aaai.org/index.php/AAAI/article/view/6898 SP - 12176-12183 AB - <p>Generating natural and accurate descriptions in image captioning has always been a challenge. In this paper, we propose a novel recall mechanism to imitate the way human conduct captioning. There are three parts in our recall mechanism : recall unit, semantic guide (SG) and recalled-word slot (RWS). Recall unit is a text-retrieval module designed to retrieve recalled words for images. SG and RWS are designed for the best use of recalled words. SG branch can generate a recalled context, which can guide the process of generating caption. RWS branch is responsible for copying recalled words to the caption. Inspired by pointing mechanism in text summarization, we adopt a soft switch to balance the generated-word probabilities between SG and RWS. In the CIDEr optimization step, we also introduce an individual recalled-word reward (WR) to boost training. Our proposed methods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICE scores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 / 129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathy test split, which surpass the results of other state-of-the-art methods.</p> ER -