Memory-Augmenting Decoder-Only Language Models through Encoders (Student Abstract)

Alessio Galatolo; Katie Winkle

doi:10.1609/aaai.v38i21.30444

Authors

Alessio Galatolo Uppsala University
Katie Winkle Uppsala University

DOI:

https://doi.org/10.1609/aaai.v38i21.30444

Keywords:

Transformer-based Language Model, Transformer Architecture, Large Language Models, Preferences

Abstract

The Transformer architecture has seen a lot of attention in recent years also thanks to its ability to scale well and allow massive parallelism during training. This has made possible the development of Language Models (LMs) of increasing size and the discovery of latent abilities that completely outclass traditional methods e.g. rule-based systems. However, they also introduced new issues, like their inability to retain the history of previous interactions due to their stateless nature or the difficulty in controlling their generation. Different attempts have been made to address these issues, e.g. a `brute force' approach to solving the memory issue is to include the full conversation history in the context window, a solution that is limited by the quadratic scalability of Transformers. In this work, we explore computationally practical solutions to the memory problem. We propose to augment the decoder-only architecture of (most) Large LMs with a (relatively small) memory encoder. Its output is prepended to the decoder's input in a similar fashion to recent works in Adapters and the original Transformer architecture. Initial experiments show promising results, however future work is needed to compare with State-of-the-Art methods.

Memory-Augmenting Decoder-Only Language Models through Encoders (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information