GAICo: Demonstrating a Unified Framework for Multi-Modal GenAI Evaluation

Authors

  • Pallav Koppisetti University of South Carolina
  • Nitin Gupta University of South Carolina
  • Kausik Lakkaraju University of South Carolina
  • Biplav Srivastava University of South Carolina

DOI:

https://doi.org/10.1609/aaai.v40i48.42360

Abstract

The rapid evolution of Generative AI, yielding outputs across text, structured data, images, and audio, has outpaced the development of standardized evaluation tools, leading to fragmented and non-reproducible practices. GAICo (Generative AI Comparator) offers a solution: a deployed, open-source Python library that provides a unified, extensible, and reproducible framework for multi-modal GenAI evaluation. Our demonstration highlights GAICo’s utility through a practical case study: evaluating and debugging composite AI Travel Assistant pipelines. We show how GAICo facilitates isolating performance issues, for instance, distinguishing orchestrator LLM planning deficiencies from specialist image model generation flaws, by consistently comparing diverse outputs against tailored references. This framework streamlines development, improves system reliability, and promotes reproducible evaluation, making it a critical tool for building safer and more effective AI. Its rapid adoption, evidenced by over 16,000 downloads in the first 6 months, underscores its relevance and impact within the AI community.

Downloads

Published

2026-03-14

How to Cite

Koppisetti, P., Gupta, N., Lakkaraju, K., & Srivastava, B. (2026). GAICo: Demonstrating a Unified Framework for Multi-Modal GenAI Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41622-41624. https://doi.org/10.1609/aaai.v40i48.42360