[1]
Pande, M., Budhraja, A., Nema, P., Kumar, P. and Khapra, M.M. 2021. The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 15 (May 2021), 13613-13621.