Zhang C, D’Haro LF, Chen Y, Zhang M, Li H. A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators. AAAI [Internet]. 2024 Mar. 24 [cited 2026 May 26];38(17):19515-24. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/29923