Preserve Context Information for Extract-Generate Long-Input Summarization Framework
DOI:
https://doi.org/10.1609/aaai.v37i11.26631Keywords:
SNLP: SummarizationAbstract
The Extract-generate framework has been a classic approach for text summarization. As pretrained language models struggling with long-input summarization for their high memory cost, extract-generate framework regains researchers' interests. However, the cost of its effectiveness in dealing with long-input summarization is the loss of context information. In this paper, we present a context-aware extract-generate framework (CAEG) for long-input text summarization. It focuses on preserving both local and global context information in an extract-generate framework with little cost, and can be applied to most of existing extract-generate summarization models. CAEG generates a set of context-related text spans called context prompts for each text snippet and use them to transfer the context information from the extractor and generator. To find such context prompts, we propose to capture the context information based on the interpretation of the extractor, where the text spans having the highest contribution to the extraction decision are considered as containing the richest context information. We evaluate our approach on both long-document and long-dialogue summarization datasets: arXiv and QMSum. The experiment results show that CAEG achieves the-state-of-art result on QMSum and outperforms other extract-generate based models in arXiv.Downloads
Published
2023-06-26
How to Cite
Yuan, R., Wang, Z., Cao, Z., & Li, W. (2023). Preserve Context Information for Extract-Generate Long-Input Summarization Framework. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13932-13939. https://doi.org/10.1609/aaai.v37i11.26631
Issue
Section
AAAI Technical Track on Speech & Natural Language Processing