AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards

Arihant Chadda; Sean McGregor; Jesse Hostetler; Andrea Brennen

doi:10.1609/aaai.v38i21.30346

AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards

Authors

Arihant Chadda IQT Labs
Sean McGregor UL Digital Safety Research Institute
Jesse Hostetler UL Digital Safety Research Institute
Andrea Brennen IQT Labs

DOI:

https://doi.org/10.1609/aaai.v38i21.30346

Keywords:

Track: AI Incidents and Best Practices (paper)

Abstract

Intelligent system audits are labor-intensive assurance activities that are typically performed once and discarded along with the opportunity to programmatically test all similar products for the market. This study illustrates how several incidents (i.e., harms) involving Named Entity Recognition (NER) can be prevented by scaling up a previously-performed audit of NER systems. The audit instrument's diagnostic capacity is maintained through a security model that protects the underlying data (i.e., addresses Goodhart's Law). An open-source evaluation infrastructure is released along with an example derived from a real-world audit that reports aggregated findings without exposing the underlying data.

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

Downloads

Published

2024-03-24

How to Cite

Chadda, A., McGregor, S., Hostetler, J., & Brennen, A. (2024). AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23035–23040. https://doi.org/10.1609/aaai.v38i21.30346

Download Citation

Issue

Vol. 38 No. 21: IAAI-24, EAAI-24, AAAI-24 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

IAAI Technical Track on AI Incidents and Best Practices

AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information