Structured Packing in LLM Training Improves Long Context Utilization

Konrad Staniszewski; Szymon Tworkowski; Sebastian Jaszczur; Yu Zhao; Henryk Michalewski; Łukasz Kuciński; Piotr Miłoś

doi:10.1609/aaai.v39i24.34706

Authors

Konrad Staniszewski University of Warsaw IDEAS NCBR
Szymon Tworkowski University of Warsaw xAI
Sebastian Jaszczur University of Warsaw IDEAS NCBR
Yu Zhao University of Edinburgh
Henryk Michalewski University of Warsaw Google DeepMind
Łukasz Kuciński University of Warsaw IDEAS NCBR Institute of Mathematics Polish Academy of Sciences
Piotr Miłoś University of Warsaw IDEAS NCBR Institute of Mathematics Polish Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v39i24.34706

Abstract

Recent advancements in long-context language modeling have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. To efficiently address this issue, we introduce the Structured Packing for Long Context, SPLiCe, a method that uses retrieval to collate mutually relevant documents into long training samples. We demonstrate that SPLiCe improves performance on long-context tasks, particularly by achieving perfect accuracy on the synthetic Needle in the Haystack benchmark, and effectively mitigating the ‘lost-in-the-middle’ phenomenon often observed in large language models. Notably, these long-context capabilities also extend to realistic downstream tasks, such as Qasper, across multiple model sizes—3B, 7B, and 13B—and are achieved with only brief fine-tuning on 2-6 billion tokens. We supplement these results with a detailed analysis of SPLiCe, examining the impact of hyperparameter choices, the different mixtures and proportions of SPLiCe-generated training data, and the choice of the retriever. We also study the transfer of long-context utilization skills between the modalities. An intriguing finding from our analysis is that training on a corpus of code can enhance performance on natural language tasks.

Structured Packing in LLM Training Improves Long Context Utilization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information