[1]
Pande, M. et al. 2021. The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 15 (May 2021), 13613–13621. DOI:https://doi.org/10.1609/aaai.v35i15.17605.