Unsupervised Deep Keyphrase Generation

Authors

  • Xianjie Shen University of California, San Diego
  • Yinghan Wang Amazon.com Inc
  • Rui Meng Salesforce Research
  • Jingbo Shang University of California, San Diego

DOI:

https://doi.org/10.1609/aaai.v36i10.21381

Keywords:

Speech & Natural Language Processing (SNLP)

Abstract

Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.

Downloads

Published

2022-06-28

How to Cite

Shen, X., Wang, Y., Meng, R., & Shang, J. (2022). Unsupervised Deep Keyphrase Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 11303-11311. https://doi.org/10.1609/aaai.v36i10.21381

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing