Rating Composite AI Models for Robustness Through Probabilistic Planning
DOI:
https://doi.org/10.1609/icaps.v36i1.42867Abstract
Many real-world AI systems combine several primitive component models, such as translators and sentiment analyzers, into larger composite models, like chatbots. Understanding how these compositions behave under uncertainty and how properties like bias or instability move through a composite model is increasingly important, yet most evaluation methods still focus on primitive models. We introduce a new use of probabilistic planning to assess the robustness of composite AI models. Each component model call is represented as a stochastic action in the RDDL domain, and the reward combines robustness metrics to the cost of components (actions). The planner runs each primitive model on randomly drawn data batches, allowing robustness to be assessed under variation in both the data and the model outputs induced by those data. We demonstrate via case studies and experiments in multilingual sentiment analysis and a synthetic domain, the planner consistently identifies more stable composite configurations than baseline methods, showing that probabilistic planning can serve as a practical, scalable approach for reasoning about reliability in complex, composite AI models.Downloads
Published
2026-06-08
How to Cite
Lakkaraju, K., Patra, S., Zehtabi, P., & Srivastava, B. (2026). Rating Composite AI Models for Robustness Through Probabilistic Planning. Proceedings of the International Conference on Automated Planning and Scheduling, 36(1), 500–508. https://doi.org/10.1609/icaps.v36i1.42867