Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Authors

  • Andis Draguns Institute of Mathematics and Computer Science, University of Latvia
  • Emīls Ozoliņš Institute of Mathematics and Computer Science, University of Latvia
  • Agris Šostaks Institute of Mathematics and Computer Science, University of Latvia
  • Matīss Apinis Institute of Mathematics and Computer Science, University of Latvia
  • Karlis Freivalds Institute of Mathematics and Computer Science, University of Latvia

DOI:

https://doi.org/10.1609/aaai.v35i8.16890

Keywords:

(Deep) Neural Network Algorithms, Speech & Signal Processing

Abstract

Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from the Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters. We show how to combine the improved Shuffle-Exchange network with convolutional layers, establishing it as a useful building block in long sequence processing applications.

Downloads

Published

2021-05-18

How to Cite

Draguns, A., Ozoliņš, E., Šostaks, A., Apinis, M., & Freivalds, K. (2021). Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences. Proceedings of the AAAI Conference on Artificial Intelligence, 35(8), 7245-7253. https://doi.org/10.1609/aaai.v35i8.16890

Issue

Section

AAAI Technical Track on Machine Learning I