GAICo: Demonstrating a Unified Framework for Multi-Modal GenAI Evaluation
DOI:
https://doi.org/10.1609/aaai.v40i48.42360Abstract
The rapid evolution of Generative AI, yielding outputs across text, structured data, images, and audio, has outpaced the development of standardized evaluation tools, leading to fragmented and non-reproducible practices. GAICo (Generative AI Comparator) offers a solution: a deployed, open-source Python library that provides a unified, extensible, and reproducible framework for multi-modal GenAI evaluation. Our demonstration highlights GAICo’s utility through a practical case study: evaluating and debugging composite AI Travel Assistant pipelines. We show how GAICo facilitates isolating performance issues, for instance, distinguishing orchestrator LLM planning deficiencies from specialist image model generation flaws, by consistently comparing diverse outputs against tailored references. This framework streamlines development, improves system reliability, and promotes reproducible evaluation, making it a critical tool for building safer and more effective AI. Its rapid adoption, evidenced by over 16,000 downloads in the first 6 months, underscores its relevance and impact within the AI community.Downloads
Published
2026-03-14
How to Cite
Koppisetti, P., Gupta, N., Lakkaraju, K., & Srivastava, B. (2026). GAICo: Demonstrating a Unified Framework for Multi-Modal GenAI Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41622-41624. https://doi.org/10.1609/aaai.v40i48.42360