(1)
Dordevic, D.; Bozic, V.; Thommes, J.; Coppola, D.; Pal Singh, S. Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks As an Alternative to Attention Layers in Transformers (Student Abstract). AAAI 2024, 38, 23477-23479.