Zhang, Z., Shao, W., Ge, Y., Wang, X., Gu, J., & Luo, P. (2024). Cached Transformers: Improving Transformers with Differentiable Memory Cachde. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16935–16943. https://doi.org/10.1609/aaai.v38i15.29636