Transformer with Memory Replay

Authors

  • Rui Liu University of Michigan, Ann Arbor
  • Barzan Mozafari University of Michigan, Ann Arbor

DOI:

https://doi.org/10.1609/aaai.v36i7.20722

Keywords:

Machine Learning (ML), Speech & Natural Language Processing (SNLP)

Abstract

Transformers achieve state-of-the-art performance for natural language processing tasks by pre-training on large-scale text corpora. They are extremely compute-intensive and have very high sample complexity. Memory replay is a mechanism that remembers and reuses past examples by saving to and replaying from a memory buffer. It has been successfully used in reinforcement learning and GANs due to better sample efficiency. In this paper, we propose Transformer with Memory Replay, which integrates memory replay with transformer, making transformer more sample efficient. Experiments on GLUE and SQuAD benchmark datasets showed that Transformer with Memory Replay can achieve at least 1% point increase compared to the baseline transformer model when pre-trained with the same number of examples. Further, by adopting a careful design that reduces the wall-clock time overhead of memory replay, we also empirically achieve a better runtime efficiency.

Downloads

Published

2022-06-28

How to Cite

Liu, R., & Mozafari, B. (2022). Transformer with Memory Replay. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7), 7567-7575. https://doi.org/10.1609/aaai.v36i7.20722

Issue

Section

AAAI Technical Track on Machine Learning II