Eriksson, Maria, Erasmo Purificato, Arman Noroozian, João Vinagre, Guillaume Chaslot, Emilia Gomez, and David Fernandez-Llorca. 2025. “Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation”. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society 8 (1):850-64. https://doi.org/10.1609/aies.v8i1.36595.