Leveraging Wikipedia Characteristics for Search and Candidate Generation in Question Answering

Jennifer Chu-Carroll; James Fan

doi:10.1609/aaai.v25i1.7968

Authors

Jennifer Chu-Carroll IBM T. J. Watson Research Center
James Fan IBM T. J. Watson Research Center

DOI:

https://doi.org/10.1609/aaai.v25i1.7968

Abstract

Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where reliable answer type identification and type-based answer extraction are not available. We present a three-pronged search approach motivated by relations an answer-justifying title-oriented document may have with the question/answer pair. We further show how Wikipedia metadata such as anchor texts and redirects can be utilized to effectively extract candidate answers from search results without a type ontology. Our experimental results show that our strategies obtained high binary recall in both search and candidate generation on TREC questions, a domain that has mature answer type extraction technology, as well as on Jeopardy! questions, a domain without such technology. Our high-recall search and candidate generation approach has also led to high overall QA performance in Watson, our end-to-end system.

Leveraging Wikipedia Characteristics for Search and Candidate Generation in Question Answering

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information