The Value of Paraphrase for Knowledge Base Predicates
Paraphrase, i.e., differing textual realizations of the same meaning, has proven useful for many natural language processing (NLP) applications. Collecting paraphrase for predicates in knowledge bases (KBs) is the key to comprehend the RDF triples in KBs. Existing works have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don't cover enough predicates, which cannot be improved by computer only and need the help of human beings. This paper shows a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. Our dataset comprises 2284 distinct predicates in DBpedia and 31130 paraphrase pairs in total, the quality of which is a great leap over previous works. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.