QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters

Authors

  • Yushan Liu Fudan University
  • Zili Wang INF Technology (Shanghai) Co., Ltd.
  • Ruifeng Yuan Hong Kong Polytechnic University

DOI:

https://doi.org/10.1609/aaai.v38i17.29836

Keywords:

NLP: Information Extraction

Abstract

Query-focused summarization (QFS) aims to summarize the source document(s) with regard to a specific aspect of information given in a query. It plays an important role in presenting users with a concise answer summary from a set of query-relevant documents retrieved by the information retrieval system. Nonetheless, the QFS research has long been hampered by the lack of adequate datasets in terms of both quality and quantity. In this paper, we introduce a large-scale multi-document query-focused summarization dataset, called QuerySum, which contains 27,041 data samples covering diverse topics and its quality is guaranteed through human verification. Unlike some previous QFS datasets constructed directly from the question answering datasets, 74% queries in our dataset are the challenging non-factoid What-, Why-, and How- questions. More importantly, we also provide a set of similar queries together with the corresponding summaries pairs for each query as the retrieved context, presenting a new feature of QuerySum. We aim to encourage research efforts in query intention understanding in the context of QFS. Leveraging QuerySum's depth, we propose a model for query-aware multi-document summarization and set a new QFS benchmark.

Published

2024-03-24

How to Cite

Liu, Y., Wang, Z., & Yuan, R. (2024). QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18725-18732. https://doi.org/10.1609/aaai.v38i17.29836

Issue

Section

AAAI Technical Track on Natural Language Processing II