Distilling Structured Rationale from Large Language Models to Small Language Models for Abstractive Summarization
DOI:
https://doi.org/10.1609/aaai.v39i24.34727Abstract
Large Language Models (LLMs) have permeated various Natural Language Processing (NLP) tasks. For the summarization tasks, LLMs can generate well-structured rationales, which consist of Essential Aspects (EA), Associated Sentences (AS) and Triple Entity Relations (TER). These rationales guide smaller models (≤1B) to produce better summaries. However, their high deployment costs (≥70B), such as substantial storage space and high computing requirements, limit their utilization in resource-constrained environments. Furthermore, effectively distilling these structured rationales from LLMs into Small Language Models (SLMs) models remains a challenge. To address this, we propose the LLM-based Structured Rationale-guided Multi-view Weak-gated Fusion framework (LSR-MWF). The framework initially employs LLMs to dig structural rationales from a document, considering multiple viewpoints such as EA, AS, and TER. Then, it develop a multi-step summary generation evaluation strategy to select high-quality structured rationales. Subsequently, it aligns with these rationales using additional modules organized in a hierarchical structure. Finally, the framework integrates the features output by these modules with original abstractive model through a weak-gated mechanism. Experimental results on two publicly available CNN/DailyMail and XSum datasets show that our method improves the performance of the abstractive model, outperforming baselines by 11.2% and 5.8%, respectively. In addition, our method improves the interpretability of summary generation from the viewpoints of EA, AS and TER.Downloads
Published
2025-04-11
How to Cite
Wang, L., Wu, L., Song, S., Wang, Y., Gao, C., & Wang, K. (2025). Distilling Structured Rationale from Large Language Models to Small Language Models for Abstractive Summarization. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 25389-25397. https://doi.org/10.1609/aaai.v39i24.34727
Issue
Section
AAAI Technical Track on Natural Language Processing III