Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training

Jing Wang; Jianlong Fu; Jinhui Tang; Zechao Li; Tao Mei

doi:10.1609/aaai.v32i1.12318

Authors

Jing Wang Nanjing University of Science and Technology
Jianlong Fu Microsoft Research
Jinhui Tang Nanjing University of Science and Technology
Zechao Li Nanjing University of Science and Technology
Tao Mei Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v32i1.12318

Keywords:

storytelling, reinforcement learning, adversarial training

Abstract

Impressive image captioning results (i.e., an objective description for an image) are achieved with plenty of training pairs. In this paper, we take one step further to investigate the creation of narrative paragraph for a photo stream. This task is even more challenging due to the difficulty in modeling an ordered photo sequence and in generating a relevant paragraph with expressive language style for storytelling. The difficulty can even be exacerbated by the limited training data, so that existing approaches almost focus on search-based solutions. To deal with these challenges, we propose a sequence-to-sequence modeling approach with reinforcement learning and adversarial training. First, to model the ordered photo stream, we propose a hierarchical recurrent neural network as story generator, which is optimized by reinforcement learning with rewards. Second, to generate relevant and story-style paragraphs, we design the rewards with two critic networks, including a multi-modal and a language-style discriminator. Third, we further consider the story generator and reward critics as adversaries. The generator aims to create indistinguishable paragraphs to human-level stories, whereas the critics aim at distinguishing them and further improving the generator by policy gradient. Experiments on three widely-used datasets show the effectiveness, against state-of-the-art methods with relative increase of 20.2% by METEOR. We also show the subjective preference for the proposed approach over the baselines through a user study with 30 human subjects.

Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information