Zhang, Chen, Luis Fernando D’Haro, Yiming Chen, Malu Zhang, and Haizhou Li. “A Comprehensive Analysis of the Effectiveness of Large Language Models As Automatic Dialogue Evaluators”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19515-19524. Accessed November 23, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/29923.