[1]
D. Dordevic, V. Bozic, J. Thommes, D. Coppola, and S. Pal Singh, “Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract)”, AAAI, vol. 38, no. 21, pp. 23477–23479, Mar. 2024.