Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract)

Danilo Dordevic; Vukasin Bozic; Joseph Thommes; Daniele Coppola; Sidak Pal Singh

doi:10.1609/aaai.v38i21.30436

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract)

Authors

Danilo Dordevic ETH Zurich
Vukasin Bozic ETH Zurich
Joseph Thommes ETH Zurich
Daniele Coppola ETH Zurich
Sidak Pal Singh ETH Zürich

DOI:

https://doi.org/10.1609/aaai.v38i21.30436

Keywords:

Distillation Learning, Transformer, Attention Mechanism, Feed-forward Networks, Natural Language Processing, Optimization

Abstract

This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these ”attentionless Transformers” to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

Downloads

Published

2024-03-24

How to Cite

Dordevic, D., Bozic, V., Thommes, J., Coppola, D., & Pal Singh, S. (2024). Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23477–23479. https://doi.org/10.1609/aaai.v38i21.30436

Download Citation

Issue

Vol. 38 No. 21: IAAI-24, EAAI-24, AAAI-24 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Student Abstract and Poster Program

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information