[1]

Zhang, C., D’Haro, L.F., Chen, Y., Zhang, M. and Li, H. 2024. A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators. Proceedings of the AAAI Conference on Artificial Intelligence. 38, 17 (Mar. 2024), 19515-19524. DOI:https://doi.org/10.1609/aaai.v38i17.29923.