The PPOu Framework: A Structured Approach for Assessing the Likelihood of Malicious Use of Advanced AI Systems
DOI:
https://doi.org/10.1609/aies.v7i1.31653Abstract
The diffusion of increasingly capable AI systems has produced concern that bad actors could intentionally misuse current or future AI systems for harm. Governments have begun to create new entities—such as AI Safety Institutes—tasked with assessing these risks. However, approaches for risk assessment are currently fragmented and would benefit from broader disciplinary expertise. As it stands, it is often unclear whether concerns about malicious use misestimate the likelihood and severity of the risks. This article advances a conceptual framework to review and structure investigation into the likelihood of an AI system (X) being applied to a malicious use (Y). We introduce a three-stage framework of (1) Plausibility (can X be used to do Y at all?), (2) Performance (how well does X do Y?), and (3) Observed use (do actors use X to do Y in practice?). At each stage, we outline key research questions, methodologies, benefits and limitations, and the types of uncertainty addressed. We also offer ideas for directions to improve risk assessment moving forward.Downloads
Published
2024-10-16
How to Cite
Goldstein, J. A., & Sastry, G. (2024). The PPOu Framework: A Structured Approach for Assessing the Likelihood of Malicious Use of Advanced AI Systems. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7(1), 503-518. https://doi.org/10.1609/aies.v7i1.31653
Issue
Section
Full Archival Papers