AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards
DOI:
https://doi.org/10.1609/aaai.v38i21.30346Keywords:
Track: AI Incidents and Best Practices (paper)Abstract
Intelligent system audits are labor-intensive assurance activities that are typically performed once and discarded along with the opportunity to programmatically test all similar products for the market. This study illustrates how several incidents (i.e., harms) involving Named Entity Recognition (NER) can be prevented by scaling up a previously-performed audit of NER systems. The audit instrument's diagnostic capacity is maintained through a security model that protects the underlying data (i.e., addresses Goodhart's Law). An open-source evaluation infrastructure is released along with an example derived from a real-world audit that reports aggregated findings without exposing the underlying data.Downloads
Published
2024-03-24
How to Cite
Chadda, A., McGregor, S., Hostetler, J., & Brennen, A. (2024). AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23035–23040. https://doi.org/10.1609/aaai.v38i21.30346
Issue
Section
IAAI Technical Track on AI Incidents and Best Practices