[1]
Z. Zhang, W. Shao, Y. Ge, X. Wang, J. Gu, and P. Luo, “Cached Transformers: Improving Transformers with Differentiable Memory Cachde”, AAAI, vol. 38, no. 15, pp. 16935–16943, Mar. 2024.