Li, Yanyang, Ye Lin, Tong Xiao, and Jingbo Zhu. “An Efficient Transformer Decoder With Compressed Sub-Layers”. Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 15 (May 18, 2021): 13315-13323. Accessed January 22, 2022. https://ojs.aaai.org/index.php/AAAI/article/view/17572.