Semi-Distantly Supervised Neural Model for Generating Compact Answers to Open-Domain Why Questions
Keywords:Summarization, Deep Learning, Neural Networks, Query, Question, Distant Supervision
This paper proposes a neural network-based method for generating compact answers to open-domain why-questions (e.g., "Why was Mr. Trump elected as the president of the US?"). Unlike factoid question answering methods that provide short text spans as answers, existing work for why-question answering have aimed at answering questions by retrieving relatively long text passages, each of which often consists of several sentences, from a text archive. While the actual answer to a why-question may be expressed over several consecutive sentences, these often contain redundant and/or unrelated parts. Such answers would not be suitable for spoken dialog systems and smart speakers such as Amazon Echo, which receive much attention in these days. In this work, we aim at generating non-redundant compact answers to why-questions from answer passages retrieved from a very large web data corpora (4 billion web pages) by an already existing open-domain why-question answering system, using a novel neural network obtained by extending existing summarization methods. We also automatically generate training data using a large number of causal relations automatically extracted from 4 billion web pages by an existing supervised causality recognizer. The data is used to train our neural network, together with manually created training data. Through a series of experiments, we show that both our novel neural network and auto-generated training data improve the quality of the generated answers both in ROUGE score and in a subjective evaluation.