Pande, M. (2021) “The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT”, Proceedings of the AAAI Conference on Artificial Intelligence, 35(15), pp. 13613–13621. doi: 10.1609/aaai.v35i15.17605.