QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters

Yushan Liu; Zili Wang; Ruifeng Yuan

doi:10.1609/aaai.v38i17.29836

Authors

Yushan Liu Fudan University
Zili Wang INF Technology (Shanghai) Co., Ltd.
Ruifeng Yuan Hong Kong Polytechnic University

DOI:

https://doi.org/10.1609/aaai.v38i17.29836

Keywords:

NLP: Information Extraction

Abstract

Query-focused summarization (QFS) aims to summarize the source document(s) with regard to a specific aspect of information given in a query. It plays an important role in presenting users with a concise answer summary from a set of query-relevant documents retrieved by the information retrieval system. Nonetheless, the QFS research has long been hampered by the lack of adequate datasets in terms of both quality and quantity. In this paper, we introduce a large-scale multi-document query-focused summarization dataset, called QuerySum, which contains 27,041 data samples covering diverse topics and its quality is guaranteed through human verification. Unlike some previous QFS datasets constructed directly from the question answering datasets, 74% queries in our dataset are the challenging non-factoid What-, Why-, and How- questions. More importantly, we also provide a set of similar queries together with the corresponding summaries pairs for each query as the retrieved context, presenting a new feature of QuerySum. We aim to encourage research efforts in query intention understanding in the context of QFS. Leveraging QuerySum's depth, we propose a model for query-aware multi-document summarization and set a new QFS benchmark.

QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information