Proxyformer: Nyström-Based Linear Transformer with Trainable Proxy Tokens

Authors

  • Sangho Lee Sungkyunkwan University
  • Hayun Lee Sungkyunkwan University
  • Dongkun Shin Sungkyunkwan University

DOI:

https://doi.org/10.1609/aaai.v38i12.29244

Keywords:

ML: Deep Learning Algorithms, ML: Deep Neural Architectures and Foundation Models

Abstract

Transformer-based models have demonstrated remarkable performance in various domains, including natural language processing, image processing and generative modeling. The most significant contributor to the successful performance of Transformer models is the self-attention mechanism, which allows for a comprehensive understanding of the interactions between tokens in the input sequence. However, there is a well-known scalability issue, the quadratic dependency (i.e. O(n^2)) of self-attention operations on the input sequence length n, making the handling of lengthy sequences challenging. To address this limitation, there has been a surge of research on efficient transformers, aiming to alleviate the quadratic dependency on the input sequence length. Among these, the Nyströmformer, which utilizes the Nyström method to decompose the attention matrix, achieves superior performance in both accuracy and throughput. However, its landmark selection exhibits redundancy, and the model incurs computational overhead when calculating the pseudo-inverse matrix. We propose a novel Nyström method-based transformer, called Proxyformer. Unlike the traditional approach of selecting landmarks from input tokens, the Proxyformer utilizes trainable neural memory, called proxy tokens, for landmarks. By integrating contrastive learning, input injection, and a specialized dropout for the decomposed matrix, Proxyformer achieves top-tier performance for long sequence tasks in the Long Range Arena benchmark.

Published

2024-03-24

How to Cite

Lee, S., Lee, H., & Shin, D. (2024). Proxyformer: Nyström-Based Linear Transformer with Trainable Proxy Tokens. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13418-13426. https://doi.org/10.1609/aaai.v38i12.29244

Issue

Section

AAAI Technical Track on Machine Learning III