Murugadoss, B., Poelitz, C., Drosos, I., Le, V., McKenna, N., Negreanu, C. S., … Sarkar, A. (2025). Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions. Proceedings of the AAAI Conference on Artificial Intelligence, 39(18), 19589–19597. https://doi.org/10.1609/aaai.v39i18.34157