From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production

Segev Shlomov; Alon Oved; Sami Marreed; Ido Levy; Offer Akrabi; Avi Yaeli; Łukasz Strąk; Elizabeth Koumpan; Yinon Goldshtein; Eilam Shapira; Nir Mashkif; Asaf Adi

doi:10.1609/aaai.v40i47.41485

Authors

Segev Shlomov IBM Research
Alon Oved IBM Research
Sami Marreed IBM Research
Ido Levy IBM Research
Offer Akrabi IBM Research
Avi Yaeli IBM Research
Łukasz Strąk IBM Consulting
Elizabeth Koumpan IBM Consulting
Yinon Goldshtein IBM Research
Eilam Shapira IBM Research
Nir Mashkif IBM Research
Asaf Adi IBM Research

DOI:

https://doi.org/10.1609/aaai.v40i47.41485

Abstract

Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development, and the absence of standardized evaluation practices. Generalist agents have emerged as a promising direction, excelling on academic benchmarks and offering flexibility across tasks, applications, and modalities. Yet, evidence of their use in enterprise settings remains limited. This paper reports IBM’s experience developing and piloting the Computer Using Generalist Agent (CUGA). CUGA adopts a hierarchical planner--executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and WebArena. Beyond benchmarks, it was evaluated in a Business-Process-Outsourcing talent acquisition pilot, addressing enterprise requirements for scalability, auditability, safety, and governance. In preliminary evaluations, CUGA approached the accuracy of specialized agents while suggesting reductions in development time and cost. We provide early evidence that generalist agents can operate at enterprise scale, distill key technical and organizational lessons, and outline requirements for transitioning research-grade architectures like CUGA into enterprise-ready systems.

From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information