QUARL: Quantifying Adversarial Risks in Language Models
DOI:
https://doi.org/10.1609/aaaiss.v4i1.31777Abstract
It is well documented that artificial intelligence (AI) systems have various types of vulnerabilities and associated risks. As such systems are deployed in safety-critical domains, it has become necessary not only to identify and enumerate the vulnerabilities but also to quantify the resulting risks. In this position paper, we discuss approaches for the challenge of quantifying AI risks. The approach is based on a general framework for testing and evaluating language model systems that we have previously developed (called TEL'M). In particular, we extend TEL'M to deal with the problem of quantifying the effort required by an adversary to discover and exploit a language model vulnerability.Downloads
Published
2024-11-08
How to Cite
Ackerman, J., Cybenko, G., Lintilhac, P., Scheible, H., & Bastian, N. D. (2024). QUARL: Quantifying Adversarial Risks in Language Models. Proceedings of the AAAI Symposium Series, 4(1), 98–101. https://doi.org/10.1609/aaaiss.v4i1.31777
Issue
Section
AI Trustworthiness and Risk Assessment for Challenging Contexts (ATRACC) - Short Papers