ERIKSSON, Maria; PURIFICATO, Erasmo; NOROOZIAN, Arman; VINAGRE, João; CHASLOT, Guillaume; GOMEZ, Emilia; FERNANDEZ-LLORCA, David. Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, [S. l.], v. 8, n. 1, p. 850–864, 2025. DOI: 10.1609/aies.v8i1.36595. Disponível em: https://ojs.aaai.org/index.php/AIES/article/view/36595. Acesso em: 29 may. 2026.