Li, Y., Y. Lin, T. Xiao, and J. Zhu. “An Efficient Transformer Decoder With Compressed Sub-Layers”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 15, May 2021, pp. 13315-23, https://ojs.aaai.org/index.php/AAAI/article/view/17572.