Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning

Authors

  • Liqun Chen Duke University
  • Ke Bai Duke University
  • Chenyang Tao Duke University
  • Yizhe Zhang Duke University
  • Guoyin Wang Duke University
  • Wenlin Wang Duke University
  • Ricardo Henao Duke University
  • Lawrence Carin Duke University

DOI:

https://doi.org/10.1609/aaai.v34i05.6249

Abstract

Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates the above issues. Our formulation emphasizes the preservation of semantic features, enabling end-to-end training instead of ad-hoc fine-tuning, and when combined with RL, it controls the exploration space for more efficient model updates. To validate the effectiveness of the proposed solution, we perform a comprehensive evaluation covering a wide variety of NLP tasks: machine translation, abstractive text summarization and image caption, with consistent improvements over competing solutions.

Downloads

Published

2020-04-03

How to Cite

Chen, L., Bai, K., Tao, C., Zhang, Y., Wang, G., Wang, W., Henao, R., & Carin, L. (2020). Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7512-7520. https://doi.org/10.1609/aaai.v34i05.6249

Issue

Section

AAAI Technical Track: Natural Language Processing