QUARL: Quantifying Adversarial Risks in Language Models

Joshua Ackerman; George Cybenko; Paul Lintilhac; Henry Scheible; Nathaniel D. Bastian

doi:10.1609/aaaiss.v4i1.31777

QUARL: Quantifying Adversarial Risks in Language Models

Authors

Joshua Ackerman Dartmouth College
George Cybenko Dartmouth College
Paul Lintilhac Dartmouth College
Henry Scheible Dartmouth College
Nathaniel D. Bastian United States Military Academy

DOI:

https://doi.org/10.1609/aaaiss.v4i1.31777

Abstract

It is well documented that artificial intelligence (AI) systems have various types of vulnerabilities and associated risks. As such systems are deployed in safety-critical domains, it has become necessary not only to identify and enumerate the vulnerabilities but also to quantify the resulting risks. In this position paper, we discuss approaches for the challenge of quantifying AI risks. The approach is based on a general framework for testing and evaluating language model systems that we have previously developed (called TEL'M). In particular, we extend TEL'M to deal with the problem of quantifying the effort required by an adversary to discover and exploit a language model vulnerability.

Downloads

Published

2024-11-08

How to Cite

Ackerman, J., Cybenko, G., Lintilhac, P., Scheible, H., & Bastian, N. D. (2024). QUARL: Quantifying Adversarial Risks in Language Models. Proceedings of the AAAI Symposium Series, 4(1), 98–101. https://doi.org/10.1609/aaaiss.v4i1.31777

Download Citation

Issue

Vol. 4 No. 1: Proceedings of the 2024 AAAI Fall Symposia

Section

AI Trustworthiness and Risk Assessment for Challenging Contexts (ATRACC) - Short Papers

QUARL: Quantifying Adversarial Risks in Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information