Zhao, T., Singh, K. Y., Appalaraju, S., Tang, P., Mahadevan, V., Manmatha, R., & Wu, Y. N. (2024). No Head Left Behind – Multi-Head Alignment Distillation for Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7514-7524. https://doi.org/10.1609/aaai.v38i7.28583