Gaps in the Safety Evaluation of Generative AI

Maribeth Rauh; Nahema Marchal; Arianna Manzini; Lisa Anne Hendricks; Ramona Comanescu; Canfer Akbulut; Tom Stepleton; Juan Mateos-Garcia; Stevie Bergman; Jackie Kay; Conor Griffin; Ben Bariach; Iason Gabriel; Verena Rieser; William Isaac; Laura Weidinger

doi:10.1609/aies.v7i1.31717

Authors

Maribeth Rauh Google DeepMind
Nahema Marchal Google DeepMind
Arianna Manzini Google DeepMind
Lisa Anne Hendricks Google DeepMind
Ramona Comanescu Google DeepMind
Canfer Akbulut Google DeepMind
Tom Stepleton Google DeepMind
Juan Mateos-Garcia Google DeepMind
Stevie Bergman Google DeepMind
Jackie Kay Google DeepMind
Conor Griffin Google DeepMind
Ben Bariach Google DeepMind
Iason Gabriel Google DeepMind
Verena Rieser Google DeepMind
William Isaac Google DeepMind
Laura Weidinger Google DeepMind

DOI:

https://doi.org/10.1609/aies.v7i1.31717

Abstract

Generative AI systems produce a range of ethical and social risks. Evaluation of these risks is a critical step on the path to ensuring the safety of these systems. However, evaluation requires the availability of validated and established measurement approaches and tools. In this paper, we provide an empirical review of the methods and tools that are available for evaluating known safety of generative AI systems to date. To this end, we review more than 200 safety-related evaluations that have been applied to generative AI systems. We categorise each evaluation along multiple axes to create a detailed snapshot of the safety evaluation landscape to date. We release this data for researchers and AI safety practitioners (https://bitly.ws/3hUzu). Analysing the current safety evaluation landscape reveals three systemic ”evaluation gaps”. First, a ”modality gap” emerges as few safety evaluations exist for non-text modalities. Second, a ”risk coverage gap” arises as evaluations for several ethical and social risks are simply lacking. Third, a ”context gap” arises as most safety evaluations are model-centric and fail to take into account the broader context in which AI systems operate. Devising next steps for safety practitioners based on these findings, we present tactical ”low-hanging fruit” steps towards closing the identified evaluation gaps and their limitations. We close by discussing the role and limitations of safety evaluation to ensure the safety of generative AI systems.

Gaps in the Safety Evaluation of Generative AI

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section