Murugadoss, Bhuvanashree, Christian Poelitz, Ian Drosos, Vu Le, Nick McKenna, Carina Suzana Negreanu, Chris Parnin, and Advait Sarkar. “Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions”. Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 18 (April 11, 2025): 19589–19597. Accessed May 30, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/34157.