Cyber-Agent-Flow: Execution Trace Instrumentation and Analysis for Cybersecurity Agent Workflows (Extended Abstract)

Authors

  • Jaime Acosta DEVCOM Army Research Laboratory
  • Mohammad Taneem Nazim University of Texas at El Paso
  • Thomas Guerra University of Texas at El Paso
  • Yinuo Du University of Texas at El Paso
  • Palvi Aggarwal University of Texas at El Paso

DOI:

https://doi.org/10.1609/aaaiss.v9i1.42950

Abstract

Large language models (LLMs) are increasingly explored as agents capable of assisting with complex cybersecurity tasks such as reconnaissance, vulnerability discovery, and penetration testing. Recent systems demonstrate that LLMs can coordinate multi-step workflows by combining reasoning with tool execution. However, most current evaluations focus primarily on task completion and provide limited insight into how agents behave during real-world engagements. We present an instrumentation framework, AgentFlow, designed to improve the observability and analysis of agent-driven workflows; and we use it to target the cybersecurity domain. It records detailed execution traces capturing prompts, model responses, tool invocations, command outputs, and analyst annotations from penetration testing engagements. The framework integrates local cybersecurity tools through Model Context Protocol (MCP) interfaces while supporting privately hosted language models. Applying this framework to multi-step pivoting scenarios, we collect execution traces and analyze agent reasoning and tool orchestration. Observations reveal inefficiencies including redundant reasoning loops, repeated tool invocations, and suboptimal sequencing strategies — highlighting opportunities for improving the reliability and efficiency of agent-assisted cybersecurity operations.

Downloads

Published

2026-06-23

How to Cite

Acosta, J., Taneem Nazim, M., Guerra, T., Du, Y., & Aggarwal, P. (2026). Cyber-Agent-Flow: Execution Trace Instrumentation and Analysis for Cybersecurity Agent Workflows (Extended Abstract). Proceedings of the AAAI Symposium Series, 9(1), 337–340. https://doi.org/10.1609/aaaiss.v9i1.42950

Issue

Section

Human-Aware AI Agents for the Cyber Battlefield: From Human Models to Autonomous Defense (Extended Abstracts)