Gaps in the Safety Evaluation of Generative AI

Authors

  • Maribeth Rauh Google DeepMind
  • Nahema Marchal Google DeepMind
  • Arianna Manzini Google DeepMind
  • Lisa Anne Hendricks Google DeepMind
  • Ramona Comanescu Google DeepMind
  • Canfer Akbulut Google DeepMind
  • Tom Stepleton Google DeepMind
  • Juan Mateos-Garcia Google DeepMind
  • Stevie Bergman Google DeepMind
  • Jackie Kay Google DeepMind
  • Conor Griffin Google DeepMind
  • Ben Bariach Google DeepMind
  • Iason Gabriel Google DeepMind
  • Verena Rieser Google DeepMind
  • William Isaac Google DeepMind
  • Laura Weidinger Google DeepMind

DOI:

https://doi.org/10.1609/aies.v7i1.31717

Abstract

Generative AI systems produce a range of ethical and social risks. Evaluation of these risks is a critical step on the path to ensuring the safety of these systems. However, evaluation requires the availability of validated and established measurement approaches and tools. In this paper, we provide an empirical review of the methods and tools that are available for evaluating known safety of generative AI systems to date. To this end, we review more than 200 safety-related evaluations that have been applied to generative AI systems. We categorise each evaluation along multiple axes to create a detailed snapshot of the safety evaluation landscape to date. We release this data for researchers and AI safety practitioners (https://bitly.ws/3hUzu). Analysing the current safety evaluation landscape reveals three systemic ”evaluation gaps”. First, a ”modality gap” emerges as few safety evaluations exist for non-text modalities. Second, a ”risk coverage gap” arises as evaluations for several ethical and social risks are simply lacking. Third, a ”context gap” arises as most safety evaluations are model-centric and fail to take into account the broader context in which AI systems operate. Devising next steps for safety practitioners based on these findings, we present tactical ”low-hanging fruit” steps towards closing the identified evaluation gaps and their limitations. We close by discussing the role and limitations of safety evaluation to ensure the safety of generative AI systems.

Downloads

Published

2024-10-16

How to Cite

Rauh, M., Marchal, N., Manzini, A., Hendricks, L. A., Comanescu, R., Akbulut, C., … Weidinger, L. (2024). Gaps in the Safety Evaluation of Generative AI. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society, 7(1), 1200–1217. https://doi.org/10.1609/aies.v7i1.31717