1.
Pande M, Budhraja A, Nema P, Kumar P, Khapra MM. The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT. AAAI [Internet]. 2021May18 [cited 2024Sep.27];35(15):13613-21. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/17605