Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Sai Puppala; Ismail Hossain; Md Jahangir Alam; Yoonpyo Lee; Jay Yoo; Tanzim Ahad; Syed Bahauddin Alam; Sajedul Talukder

doi:10.1609/aaaiss.v9i1.42945

Authors

Sai Puppala Southern Illinois University
Ismail Hossain University of Texas at El Paso
Md Jahangir Alam University of Texas at El Paso
Yoonpyo Lee Hanyang University
Jay Yoo University of Illinois Urbana-Champaign
Tanzim Ahad University of Texas at El Paso
Syed Bahauddin Alam University of Illinois Urbana-Champaign
Sajedul Talukder University of Texas at El Paso

DOI:

https://doi.org/10.1609/aaaiss.v9i1.42945

Abstract

Large language models are becoming deep agents that plan, persist state, and invoke tools, shifting safety failures from unsafe text to unsafe trajectories. We introduce AgentFence, an architecture-centric security evaluation that defines 14 trust-boundary attack classes across planning, memory, retrieval, tool use, and delegation, and detects failure via trace-auditable conversation breaks: unauthorized or unsafe tool use, wrong-principal actions, state or objective integrity violations, and attack-linked deviations. Holding the base model fixed, we evaluate eight agent archetypes under persistent multi-turn interaction and find substantial architectural variation in mean security break rate (MSBR), from 0.29 ± 0.04 for LangGraph to 0.51 ± 0.07 for AutoGPT. The highest-risk classes are operational: Denial-of-Wallet at 0.62 ± 0.08, Authorization Confusion at 0.54 ± 0.10, Retrieval Poisoning at 0.47 ± 0.09, and Planning Manipulation at 0.44 ± 0.11, while prompt-centric classes remain below 0.20 under standard settings. Breaks are dominated by boundary violations: SIV 31%, WPA 27%, UTI plus UTA 24%, and ATD 18%. Authorization confusion correlates with objective and tool hijacking, with rho approximately 0.63 and rho approximately 0.58, respectively. AgentFence reframes agent security around what matters operationally: whether an agent stays within its goal and authority envelope over time.

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information