How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

Zewei Chu; Mingda Chen; Jing Chen; Miaosen Wang; Kevin Gimpel; Manaal Faruqui; Xiance Si

doi:10.1609/aaai.v34i05.6258

How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

Authors

Zewei Chu The University of Chicago
Mingda Chen Toyota Technological Institute at Chicago
Jing Chen Google Assistant
Miaosen Wang Google Assistant
Kevin Gimpel Toyota Technological Institute at Chicago
Manaal Faruqui Google Assistant
Xiance Si Google Assistant

DOI:

https://doi.org/10.1609/aaai.v34i05.6258

Abstract

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2% in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.¹

Downloads

Published

2020-04-03

How to Cite

Chu, Z., Chen, M., Chen, J., Wang, M., Gimpel, K., Faruqui, M., & Si, X. (2020). How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 7586-7593. https://doi.org/10.1609/aaai.v34i05.6258

Download Citation

Issue

Vol. 34 No. 05: AAAI-20 Technical Tracks 5

Section

AAAI Technical Track: Natural Language Processing

How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription