HALLPERM: Exposing the Safety Illusion in LLM Tool Use via Implicit Privilege Escalation and Semantic Risk

Md Jahangir Alam; Tanzim Ahad; Ismail Hossain; Sai Puppala; Yoonpyo Lee; Syed Bahauddin Alam; Sajedul Talukder

doi:10.1609/aaaiss.v9i1.42937

Authors

Md Jahangir Alam University of Texas at El Paso
Tanzim Ahad University of Texas at El Paso
Ismail Hossain University of Texas at El Paso
Sai Puppala Southern Illinois University Carbondale
Yoonpyo Lee Hanyang University
Syed Bahauddin Alam University of Illinois Urbana-Champaign
Sajedul Talukder University of Texas at El Paso

DOI:

https://doi.org/10.1609/aaaiss.v9i1.42937

Abstract

Large language model (LLM) agents increasingly rely on external tools via structured schemas, yet the safety implications of under-specified tool interfaces remain poorly understood. We introduce HALLPERM, a benchmark for evaluating hallucinated permissions in tool-calling agents, and pro- pose two complementary metrics: Implicit Privilege Escalation Rate (IPER), capturing undocumented parameter usage, and Semantic Risk Rate (SRR), capturing unsafe intent expressed in natural language reasoning. Across 768 evaluation instances spanning 16 models and 6 tools, we find that explicit schema violations are rare (IPER = 0.78% averaged across conditions), while semantic unsafe intent is widespread (SRR = 65.95%). This reveals a persistent safety illusion: models appear compliant at the structural level while exhibiting unsafe intent. At the tool level, high-risk tools such as run code and query database show SRR exceeding 80% despite zero IPER, demonstrating that parameter-level validation alone is insufficient. We further evaluate a hardened system prompt and observe a reduction in IPER (0.93% → 0.60%) but no mitigation of SRR, which slightly increases (64.6% → 67.3%). Our findings highlight a fundamental gap between schema compliance and safe behaviour, motivating the need for semantic-aware evaluation and enforcement mechanisms in LLM tool ecosystems.

HALLPERM: Exposing the Safety Illusion in LLM Tool Use via Implicit Privilege Escalation and Semantic Risk

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information