Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training

Authors

  • Jing Wang Nanjing University of Science and Technology
  • Jianlong Fu Microsoft Research
  • Jinhui Tang Nanjing University of Science and Technology
  • Zechao Li Nanjing University of Science and Technology
  • Tao Mei Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v32i1.12318

Keywords:

storytelling, reinforcement learning, adversarial training

Abstract

Impressive image captioning results (i.e., an objective description for an image) are achieved with plenty of training pairs. In this paper, we take one step further to investigate the creation of narrative paragraph for a photo stream. This task is even more challenging due to the difficulty in modeling an ordered photo sequence and in generating a relevant paragraph with expressive language style for storytelling. The difficulty can even be exacerbated by the limited training data, so that existing approaches almost focus on search-based solutions. To deal with these challenges, we propose a sequence-to-sequence modeling approach with reinforcement learning and adversarial training. First, to model the ordered photo stream, we propose a hierarchical recurrent neural network as story generator, which is optimized by reinforcement learning with rewards. Second, to generate relevant and story-style paragraphs, we design the rewards with two critic networks, including a multi-modal and a language-style discriminator. Third, we further consider the story generator and reward critics as adversaries. The generator aims to create indistinguishable paragraphs to human-level stories, whereas the critics aim at distinguishing them and further improving the generator by policy gradient. Experiments on three widely-used datasets show the effectiveness, against state-of-the-art methods with relative increase of 20.2% by METEOR. We also show the subjective preference for the proposed approach over the baselines through a user study with 30 human subjects.

Downloads

Published

2018-04-27

How to Cite

Wang, J., Fu, J., Tang, J., Li, Z., & Mei, T. (2018). Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.12318