[1]

H. Yin, “SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models”, AAAI, vol. 40, no. 40, pp. 34467-34475, Mar. 2026.