Structured Packing in LLM Training Improves Long Context Utilization
DOI:
https://doi.org/10.1609/aaai.v39i24.34706Abstract
Recent advancements in long-context language modeling have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. To efficiently address this issue, we introduce the Structured Packing for Long Context, SPLiCe, a method that uses retrieval to collate mutually relevant documents into long training samples. We demonstrate that SPLiCe improves performance on long-context tasks, particularly by achieving perfect accuracy on the synthetic Needle in the Haystack benchmark, and effectively mitigating the ‘lost-in-the-middle’ phenomenon often observed in large language models. Notably, these long-context capabilities also extend to realistic downstream tasks, such as Qasper, across multiple model sizes—3B, 7B, and 13B—and are achieved with only brief fine-tuning on 2-6 billion tokens. We supplement these results with a detailed analysis of SPLiCe, examining the impact of hyperparameter choices, the different mixtures and proportions of SPLiCe-generated training data, and the choice of the retriever. We also study the transfer of long-context utilization skills between the modalities. An intriguing finding from our analysis is that training on a corpus of code can enhance performance on natural language tasks.Downloads
Published
2025-04-11
How to Cite
Staniszewski, K., Tworkowski, S., Jaszczur, S., Zhao, Y., Michalewski, H., Kuciński, Łukasz, & Miłoś, P. (2025). Structured Packing in LLM Training Improves Long Context Utilization. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 25201–25209. https://doi.org/10.1609/aaai.v39i24.34706
Issue
Section
AAAI Technical Track on Natural Language Processing III