Beyond Attention: Breaking the Limits of Transformer Context Length with Recurrent Memory

Aydar Bulatov; Yuri Kuratov; Yermek Kapushev; Mikhail Burtsev

doi:10.1609/aaai.v38i16.29722

Beyond Attention: Breaking the Limits of Transformer Context Length with Recurrent Memory

Authors

Aydar Bulatov MIPT
Yuri Kuratov AIRI MIPT
Yermek Kapushev AIRI
Mikhail Burtsev LIMS

DOI:

https://doi.org/10.1609/aaai.v38i16.29722

Keywords:

NLP: (Large) Language Models, ML: Deep Learning Algorithms, NLP: Generation

Abstract

A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. Our approach demonstrates the capability to store information in memory for sequences of up to an unprecedented two million tokens while maintaining high retrieval accuracy. Experiments with language modeling tasks show perplexity improvement as the number of processed input segments increases. These results underscore the effectiveness of our method, which has significant potential to enhance long-term dependency handling in natural language understanding and generation tasks, as well as enable large-scale context processing for memory-intensive applications.

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

Downloads

Published

2024-03-24

How to Cite

Bulatov, A., Kuratov, Y., Kapushev, Y., & Burtsev, M. (2024). Beyond Attention: Breaking the Limits of Transformer Context Length with Recurrent Memory. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17700-17708. https://doi.org/10.1609/aaai.v38i16.29722

Download Citation

Issue

Vol. 38 No. 16: AAAI-24 Technical Tracks 16

Section

AAAI Technical Track on Natural Language Processing I

Beyond Attention: Breaking the Limits of Transformer Context Length with Recurrent Memory

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription