[1]
M. Pande, A. Budhraja, P. Nema, P. Kumar, and M. M. Khapra, “The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT”, AAAI, vol. 35, no. 15, pp. 13613-13621, May 2021.