Yin, H., Y. Chen, C. Deng, L. Cheng, H. Wang, C.-H. Tan, Q. Chen, W. Wang, and X. Li. “SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition With Multimodal Large Language Models”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 40, Mar. 2026, pp. 34467-75, doi:10.1609/aaai.v40i40.40745.