AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations

Dhaval Patel; Nianjun Zhou; Shuxin Lin; James Rayfield; Chathurangi Shyalika; Suryanarayana Reddy Yarrabothula

doi:10.1609/aaai.v40i48.42372

AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations

Authors

Dhaval Patel IBM Research, Yorktown
Nianjun Zhou IBM Research, Yorktown
Shuxin Lin IBM Research, Yorktown
James Rayfield IBM Research, Yorktown
Chathurangi Shyalika Artificial Intelligence Institute, University of South Carolina
Suryanarayana Reddy Yarrabothula Steel Authority of India Limited

DOI:

https://doi.org/10.1609/aaai.v40i48.42372

Abstract

Industrial automation increasingly relies on multi-agent AI, yet evaluation remains difficult due to task complexity and data confidentiality. We present AssetOpsBench-Live, a demo of a competition-ready platform for real-time, privacy-preserving evaluation of multi-agent AI in industrial contexts. The platform integrates AssetOpsBench, which measures six dimensions of multi-agent performance and performs automated failure-mode discovery, with Codabench, which supports reproducible, code-oriented competitions. End users first validate agents locally, then submit containerized code for execution on hidden industrial scenarios. Instead of raw trajectories, the system provides quantitative scores and clustered failure modes (e.g., reasoning--action mismatch, step repetition), enabling participants to identify failures, apply targeted improvements, and iteratively resubmit. By combining competition-based engagement with actionable diagnostics, AssetOpsBench-Live delivers reproducible, real-time insights reflecting real-world industrial constraints.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

PDF
Video

Published

2026-03-14

How to Cite

Patel, D., Zhou, N., Lin, S., Rayfield, J., Shyalika, C., & Yarrabothula, S. R. (2026). AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41658–41660. https://doi.org/10.1609/aaai.v40i48.42372

Download Citation

Issue

Vol. 40 No. 48: EAAI-26 AI for Education, Model AI Assignments, AAAI-26 Emerging Trends, Doctoral Consortium, Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Demonstration Track

AssetOpsBench-Live: Privacy-Aware Online Evaluation of Multi-Agent Performance in Industrial Operations

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information