QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters
DOI:
https://doi.org/10.1609/aaai.v38i17.29836Keywords:
NLP: Information ExtractionAbstract
Query-focused summarization (QFS) aims to summarize the source document(s) with regard to a specific aspect of information given in a query. It plays an important role in presenting users with a concise answer summary from a set of query-relevant documents retrieved by the information retrieval system. Nonetheless, the QFS research has long been hampered by the lack of adequate datasets in terms of both quality and quantity. In this paper, we introduce a large-scale multi-document query-focused summarization dataset, called QuerySum, which contains 27,041 data samples covering diverse topics and its quality is guaranteed through human verification. Unlike some previous QFS datasets constructed directly from the question answering datasets, 74% queries in our dataset are the challenging non-factoid What-, Why-, and How- questions. More importantly, we also provide a set of similar queries together with the corresponding summaries pairs for each query as the retrieved context, presenting a new feature of QuerySum. We aim to encourage research efforts in query intention understanding in the context of QFS. Leveraging QuerySum's depth, we propose a model for query-aware multi-document summarization and set a new QFS benchmark.Downloads
Published
2024-03-24
How to Cite
Liu, Y., Wang, Z., & Yuan, R. (2024). QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18725-18732. https://doi.org/10.1609/aaai.v38i17.29836
Issue
Section
AAAI Technical Track on Natural Language Processing II