Eriksson, M., Purificato, E., Noroozian, A., Vinagre, J., Chaslot, G., Gomez, E., & Fernandez-Llorca, D. (2025). Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society, 8(1), 850–864. https://doi.org/10.1609/aies.v8i1.36595