Uncovering Systemic and Environment Errors in Autonomous Systems Using Differential Testing
DOI:
https://doi.org/10.1609/aaaiss.v7i1.36877Abstract
Deploying autonomous agents in complex environments requires distinguishing between undesirable behaviors caused by the impreciseness of the agent's reasoning model or its policy (i.e. systemic agent error) and those due to inherently unsolvable tasks (environment error). We introduce AIProbe, a novel black-box differential testing framework to validate autonomous agents under varied and challenging environment configurations. We first describe how AIProbe generates diverse environmental configurations and tasks for testing the agent, by modifying configurable parameters using Latin Hypercube sampling. It then solves each generated task using a search-based planner, independent of the agent. By comparing the agent's performance to the planner's solution, AIProbe identifies whether failures are due to errors in the agent's model or policy, or due to unsolvable task conditions. We then demonstrate its broad applicability to both model-free and model-based agents operating in discrete and continuous domains. Our evaluation across multiple domains shows that AIProbe significantly outperforms state-of-the-art techniques in detecting unique errors, thereby contributing to a reliable deployment of autonomous agents.Downloads
Published
2025-11-23
How to Cite
Anand, Y., Mehta, R. P., Motwani, M., & Saisubramanian, S. (2025). Uncovering Systemic and Environment Errors in Autonomous
Systems Using Differential Testing. Proceedings of the AAAI Symposium Series, 7(1), 122–130. https://doi.org/10.1609/aaaiss.v7i1.36877
Issue
Section
AI Trustworthiness and Risk Assessment for Challenged Contexts (ATRACC)