TY - JOUR AU - Hu, J. Edward AU - Rudinger, Rachel AU - Post, Matt AU - Van Durme, Benjamin PY - 2019/07/17 Y2 - 2024/03/29 TI - PARABANK: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-Constrained Neural Machine Translation JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 33 IS - 01 SE - AAAI Technical Track: Natural Language Processing DO - 10.1609/aaai.v33i01.33016521 UR - https://ojs.aaai.org/index.php/AAAI/article/view/4618 SP - 6521-6528 AB - <p>We present PARABANK, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of PARANMT (Wieting and Gimpel, 2018), we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce <em>multiple</em> high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that PARABANK’s paraphrases improve over PARANMT on both semantic similarity and fluency. Finally, we use PARABANK to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.</p> ER -