SciTaiL: A Textual Entailment Dataset from Science Question Answering

Tushar Khot; Ashish Sabharwal; Peter Clark

doi:10.1609/aaai.v32i1.12022

Authors

Tushar Khot Allen Institute for Artificial Intelligence
Ashish Sabharwal Allen Institute for Artificial Intelligence
Peter Clark Allen Institute for Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v32i1.12022

Keywords:

textual entailment, dataset, neural networks, structured entailment, science question answering

Abstract

We present a new dataset and model for textual entailment, derived from treating multiple-choice question-answering as an entailment problem. SciTail is the first entailment set that is created solely from natural sentences that already exist independently ``in the wild'' rather than sentences authored specifically for the entailment task. Different from existing entailment datasets, we create hypotheses from science questions and the corresponding answer candidates, and premises from relevant web sentences retrieved from a large corpus. These sentences are often linguistically challenging. This, combined with the high lexical similarity of premise and hypothesis for both entailed and non-entailed pairs, makes this new entailment task particularly difficult. The resulting challenge is evidenced by state-of-the-art textual entailment systems achieving mediocre performance on SciTail, especially in comparison to a simple majority class baseline. As a step forward, we demonstrate that one can improve accuracy on SciTail by 5% using a new neural model that exploits linguistic structure.

SciTaiL: A Textual Entailment Dataset from Science Question Answering

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription