Mitigating Large Vision-Language Model Hallucination at Post-hoc via Multi-agent System

Chung-En (Johnny) Yu; Brian Jalaian; Nathaniel D. Bastian

doi:10.1609/aaaiss.v4i1.31780

Mitigating Large Vision-Language Model Hallucination at Post-hoc via Multi-agent System

Authors

Chung-En (Johnny) Yu University of West Florida
Brian Jalaian University of West Florida
Nathaniel D. Bastian United States Military Academy

DOI:

https://doi.org/10.1609/aaaiss.v4i1.31780

Abstract

This paper addresses the critical issue of hallucination in Large Vision-Language Models (LVLMs) by proposing a novel multi-agent framework. We integrate three post-hoc correction techniques: self-correction, external feedback, and agent debate, to enhance LVLM trustworthiness. Our approach tackles key challenges in LVLM hallucination, including weak visual encoders, parametric knowledge bias, and loss of visual attention during inference. The framework employs a Plug-in LVLM as the base model to reduce its hallucination, a Large Language Model (LLM) for guided refinement, external toolbox models for factual grounding, and an agent debate system for consensus-building. While promising, we also discuss potential limitations and technical challenges in implementing such a complex system. This work contributes to the ongoing effort to create more reliable and trustworthy multimodal multi-agent systems.

Downloads

Published

2024-11-08

How to Cite

Yu, C.-E. (Johnny), Jalaian, B., & Bastian, N. D. (2024). Mitigating Large Vision-Language Model Hallucination at Post-hoc via Multi-agent System. Proceedings of the AAAI Symposium Series, 4(1), 110–113. https://doi.org/10.1609/aaaiss.v4i1.31780

Download Citation

Issue

Vol. 4 No. 1: Proceedings of the 2024 AAAI Fall Symposia

Section

AI Trustworthiness and Risk Assessment for Challenging Contexts (ATRACC) - Short Papers

Mitigating Large Vision-Language Model Hallucination at Post-hoc via Multi-agent System

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information