Li, Y., Y. Lin, T. Xiao, and J. Zhu. “An Efficient Transformer Decoder With Compressed Sub-Layers”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 15, May 2021, pp. 13315-23, doi:10.1609/aaai.v35i15.17572.