1.
Pande M, Budhraja A, Nema P, Kumar P, Khapra MM. The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT. AAAI [Internet]. 2021May18 [cited 2022Jan.26];35(15):13613-21. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/17605