Dordevic, Danilo, Vukasin Bozic, Joseph Thommes, Daniele Coppola, and Sidak Pal Singh. “Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks As an Alternative to Attention Layers in Transformers (Student Abstract)”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (March 24, 2024): 23477–23479. Accessed May 29, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/30436.