[1]
Pande, M., Budhraja, A., Nema, P., Kumar, P. and Khapra, M.M. 2021. The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 15 (May 2021), 13613-13621. DOI:https://doi.org/10.1609/aaai.v35i15.17605.