Leveraging Wikipedia Characteristics for Search and Candidate Generation in Question Answering

Authors

  • Jennifer Chu-Carroll IBM T. J. Watson Research Center
  • James Fan IBM T. J. Watson Research Center

DOI:

https://doi.org/10.1609/aaai.v25i1.7968

Abstract

Most existing Question Answering (QA) systems adopt a type-and-generate approach to candidate generation that relies on a pre-defined domain ontology. This paper describes a type independent search and candidate generation paradigm for QA that leverages Wikipedia characteristics. This approach is particularly useful for adapting QA systems to domains where reliable answer type identification and type-based answer extraction are not available. We present a three-pronged search approach motivated by relations an answer-justifying title-oriented document may have with the question/answer pair. We further show how Wikipedia metadata such as anchor texts and redirects can be utilized to effectively extract candidate answers from search results without a type ontology. Our experimental results show that our strategies obtained high binary recall in both search and candidate generation on TREC questions, a domain that has mature answer type extraction technology, as well as on Jeopardy! questions, a domain without such technology. Our high-recall search and candidate generation approach has also led to high overall QA performance in Watson, our end-to-end system.

Downloads

Published

2011-08-04

How to Cite

Chu-Carroll, J., & Fan, J. (2011). Leveraging Wikipedia Characteristics for Search and Candidate Generation in Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1), 872-877. https://doi.org/10.1609/aaai.v25i1.7968

Issue

Section

AAAI Technical Track: Natural Language Processing