Distilling Structured Rationale from Large Language Models to Small Language Models for Abstractive Summarization

Linyong Wang; Lianwei Wu; Shaoqi Song; Yaxiong Wang; Cuiyun Gao; Kang Wang

doi:10.1609/aaai.v39i24.34727

Authors

Linyong Wang Northwestern Polytechnical University, Xi’an
Lianwei Wu Northwestern Polytechnical University, Xi’an
Shaoqi Song Northwestern Polytechnical University, Xi’an
Yaxiong Wang Hefei University of Technology, Hefei
Cuiyun Gao Harbin Institute of Technology, Shenzhen
Kang Wang Northwestern Polytechnical University, Xi’an

DOI:

https://doi.org/10.1609/aaai.v39i24.34727

Abstract

Large Language Models (LLMs) have permeated various Natural Language Processing (NLP) tasks. For the summarization tasks, LLMs can generate well-structured rationales, which consist of Essential Aspects (EA), Associated Sentences (AS) and Triple Entity Relations (TER). These rationales guide smaller models (≤1B) to produce better summaries. However, their high deployment costs (≥70B), such as substantial storage space and high computing requirements, limit their utilization in resource-constrained environments. Furthermore, effectively distilling these structured rationales from LLMs into Small Language Models (SLMs) models remains a challenge. To address this, we propose the LLM-based Structured Rationale-guided Multi-view Weak-gated Fusion framework (LSR-MWF). The framework initially employs LLMs to dig structural rationales from a document, considering multiple viewpoints such as EA, AS, and TER. Then, it develop a multi-step summary generation evaluation strategy to select high-quality structured rationales. Subsequently, it aligns with these rationales using additional modules organized in a hierarchical structure. Finally, the framework integrates the features output by these modules with original abstractive model through a weak-gated mechanism. Experimental results on two publicly available CNN/DailyMail and XSum datasets show that our method improves the performance of the abstractive model, outperforming baselines by 11.2％ and 5.8％, respectively. In addition, our method improves the interpretability of summary generation from the viewpoints of EA, AS and TER.

Distilling Structured Rationale from Large Language Models to Small Language Models for Abstractive Summarization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information