MURUGADOSS, Bhuvanashree; POELITZ, Christian; DROSOS, Ian; LE, Vu; MCKENNA, Nick; NEGREANU, Carina Suzana; PARNIN, Chris; SARKAR, Advait. Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 39, n. 18, p. 19589–19597, 2025. DOI: 10.1609/aaai.v39i18.34157. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/34157. Acesso em: 30 may. 2026.