Topic Concentration in Query Focused Summarization Datasets

Authors

  • Tal Baumel Ben-Gurion University
  • Raphael Cohen Ben-Gurion University
  • Michael Elhadad Ben-Gurion University

DOI:

https://doi.org/10.1609/aaai.v30i1.10323

Keywords:

Automatic suumarization, datasets, evaluation, QFS, IR

Abstract

Query-Focused Summarization (QFS) summarizes a document cluster in response to a specific input query. QFS algorithms must combine query relevance assessment, central content identification, and redundancy avoidance. Frustratingly, state of the art algorithms designed for QFS do not significantly improve upon generic summarization methods, which ignore query relevance, when evaluated on traditional QFS datasets. We hypothesize this lack of success stems from the nature of the dataset. We define a task-based method to quantify topic concentration in datasets, i.e., the ratio of sentences within the dataset that are relevant to the query, and observe that the DUC 2005, 2006 and 2007 datasets suffer from very high topic concentration. We introduce TD-QFS, a new QFS dataset with controlled levels of topic concentration. We compare competitive baseline algorithms on TD-QFS and report strong improvement in ROUGE performance for algorithms that properly model query relevance as opposed to generic summarizers. We further present three new and simple QFS algorithms, RelSum, ThresholdSum, and TFIDF-KLSum that outperform state of the art QFS algorithms on the TD-QFS dataset by a large margin.

Downloads

Published

2016-03-05

How to Cite

Baumel, T., Cohen, R., & Elhadad, M. (2016). Topic Concentration in Query Focused Summarization Datasets. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10323

Issue

Section

Technical Papers: NLP and Knowledge Representation