Murugadoss, Bhuvanashree, Christian Poelitz, Ian Drosos, Vu Le, Nick McKenna, Carina Suzana Negreanu, Chris Parnin, and Advait Sarkar. 2025. “Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions”. Proceedings of the AAAI Conference on Artificial Intelligence 39 (18):19589-97. https://doi.org/10.1609/aaai.v39i18.34157.