The PPOu Framework: A Structured Approach for Assessing the Likelihood of Malicious Use of Advanced AI Systems

Authors

  • Josh A. Goldstein Georgetown University, Center for Security and Emerging Technology
  • Girish Sastry OpenAI

DOI:

https://doi.org/10.1609/aies.v7i1.31653

Abstract

The diffusion of increasingly capable AI systems has produced concern that bad actors could intentionally misuse current or future AI systems for harm. Governments have begun to create new entities—such as AI Safety Institutes—tasked with assessing these risks. However, approaches for risk assessment are currently fragmented and would benefit from broader disciplinary expertise. As it stands, it is often unclear whether concerns about malicious use misestimate the likelihood and severity of the risks. This article advances a conceptual framework to review and structure investigation into the likelihood of an AI system (X) being applied to a malicious use (Y). We introduce a three-stage framework of (1) Plausibility (can X be used to do Y at all?), (2) Performance (how well does X do Y?), and (3) Observed use (do actors use X to do Y in practice?). At each stage, we outline key research questions, methodologies, benefits and limitations, and the types of uncertainty addressed. We also offer ideas for directions to improve risk assessment moving forward.

Downloads

Published

2024-10-16

How to Cite

Goldstein, J. A., & Sastry, G. (2024). The PPOu Framework: A Structured Approach for Assessing the Likelihood of Malicious Use of Advanced AI Systems. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7(1), 503-518. https://doi.org/10.1609/aies.v7i1.31653