An Efficient Transformer Decoder with Compressed Sub-layers

Yanyang Li; Ye Lin; Tong Xiao; Jingbo Zhu

doi:10.1609/aaai.v35i15.17572

An Efficient Transformer Decoder with Compressed Sub-layers

Authors

Yanyang Li Northeastern University, China
Ye Lin Northeastern University, China
Tong Xiao Northeastern University, China
Jingbo Zhu Northeastern University, China

DOI:

https://doi.org/10.1609/aaai.v35i15.17572

Keywords:

Machine Translation & Multilinguality

Abstract

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic formulation of the decoder, we show that under some mild conditions, the architecture could be simplified by compressing its sub-layers, the basic building block of Transformer, and achieves a higher parallelism. We thereby propose Compressed Attention Network, whose decoder layer consists of only one sub-layer instead of three. Extensive experiments on 14 WMT machine translation tasks show that our model is 1.42x faster with performance on par with a strong baseline. This strong baseline is already 2x faster than the widely used standard baseline without loss in performance.

Downloads

Published

2021-05-18

How to Cite

Li, Y., Lin, Y., Xiao, T., & Zhu, J. (2021). An Efficient Transformer Decoder with Compressed Sub-layers. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15), 13315-13323. https://doi.org/10.1609/aaai.v35i15.17572

Download Citation

Issue

Vol. 35 No. 15: AAAI-21 Technical Tracks 15

Section

AAAI Technical Track on Speech and Natural Language Processing II

An Efficient Transformer Decoder with Compressed Sub-layers

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription