The Value of Paraphrase for Knowledge Base Predicates

Bingcong Xue; Sen Hu; Lei Zou; Jiashu Cheng

doi:10.1609/aaai.v34i05.6475

Authors

Bingcong Xue Peking University
Sen Hu Peking University
Lei Zou Peking University
Jiashu Cheng Culver Academies

DOI:

https://doi.org/10.1609/aaai.v34i05.6475

Abstract

Paraphrase, i.e., differing textual realizations of the same meaning, has proven useful for many natural language processing (NLP) applications. Collecting paraphrase for predicates in knowledge bases (KBs) is the key to comprehend the RDF triples in KBs. Existing works have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don't cover enough predicates, which cannot be improved by computer only and need the help of human beings. This paper shows a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. Our dataset comprises 2284 distinct predicates in DBpedia and 31130 paraphrase pairs in total, the quality of which is a great leap over previous works. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.

The Value of Paraphrase for Knowledge Base Predicates

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information