From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production

Authors

  • Segev Shlomov IBM Research
  • Alon Oved IBM Research
  • Sami Marreed IBM Research
  • Ido Levy IBM Research
  • Offer Akrabi IBM Research
  • Avi Yaeli IBM Research
  • Łukasz Strąk IBM Consulting
  • Elizabeth Koumpan IBM Consulting
  • Yinon Goldshtein IBM Research
  • Eilam Shapira IBM Research
  • Nir Mashkif IBM Research
  • Asaf Adi IBM Research

DOI:

https://doi.org/10.1609/aaai.v40i47.41485

Abstract

Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development, and the absence of standardized evaluation practices. Generalist agents have emerged as a promising direction, excelling on academic benchmarks and offering flexibility across tasks, applications, and modalities. Yet, evidence of their use in enterprise settings remains limited. This paper reports IBM’s experience developing and piloting the Computer Using Generalist Agent (CUGA). CUGA adopts a hierarchical planner--executor architecture with strong analytical foundations, achieving state-of-the-art performance on AppWorld and WebArena. Beyond benchmarks, it was evaluated in a Business-Process-Outsourcing talent acquisition pilot, addressing enterprise requirements for scalability, auditability, safety, and governance. In preliminary evaluations, CUGA approached the accuracy of specialized agents while suggesting reductions in development time and cost. We provide early evidence that generalist agents can operate at enterprise scale, distill key technical and organizational lessons, and outline requirements for transitioning research-grade architectures like CUGA into enterprise-ready systems.

Published

2026-03-14

How to Cite

Shlomov, S., Oved, A., Marreed, S., Levy, I., Akrabi, O., Yaeli, A., … Adi, A. (2026). From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 40423–40431. https://doi.org/10.1609/aaai.v40i47.41485

Issue

Section

IAAI Technical Track on Emerging Applications of AI