QUARL: Quantifying Adversarial Risks in Language Models

Authors

  • Joshua Ackerman Dartmouth College
  • George Cybenko Dartmouth College
  • Paul Lintilhac Dartmouth College
  • Henry Scheible Dartmouth College
  • Nathaniel D. Bastian United States Military Academy

DOI:

https://doi.org/10.1609/aaaiss.v4i1.31777

Abstract

It is well documented that artificial intelligence (AI) systems have various types of vulnerabilities and associated risks. As such systems are deployed in safety-critical domains, it has become necessary not only to identify and enumerate the vulnerabilities but also to quantify the resulting risks. In this position paper, we discuss approaches for the challenge of quantifying AI risks. The approach is based on a general framework for testing and evaluating language model systems that we have previously developed (called TEL'M). In particular, we extend TEL'M to deal with the problem of quantifying the effort required by an adversary to discover and exploit a language model vulnerability.

Downloads

Published

2024-11-08

How to Cite

Ackerman, J., Cybenko, G., Lintilhac, P., Scheible, H., & Bastian, N. D. (2024). QUARL: Quantifying Adversarial Risks in Language Models. Proceedings of the AAAI Symposium Series, 4(1), 98–101. https://doi.org/10.1609/aaaiss.v4i1.31777

Issue

Section

AI Trustworthiness and Risk Assessment for Challenging Contexts (ATRACC) - Short Papers