RFI: Rectified Flow Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Junyu Cheng; Zhibiao Liang; Yidong Chen; Shuangyin Li

doi:10.1609/aaai.v40i5.37320

Authors

Junyu Cheng Department of Artificial Intelligence, School of Informatics, Xiamen University, China
Zhibiao Liang School of Computer Science, South China Normal University, China
Yidong Chen Department of Artificial Intelligence, School of Informatics, Xiamen University, China Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
Shuangyin Li School of Computer Science, South China Normal University, China

DOI:

https://doi.org/10.1609/aaai.v40i5.37320

Abstract

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation by integrating visual and textual data. However, these models frequently exhibit object hallucination problems: generating outputs that are inconsistent with the input image. Existing improved methods for mitigating hallucinations still suffer from two key limitations: dynamic approaches based on logits or attention mechanisms risk suppressing valuable linguistic priors, whereas static methods that employ fixed intervention vectors lack the flexibility to adapt to diverse images and questions. To address these issues, we propose RFI (Rectified Flow Intervention), a novel approach that harnesses the linear trajectory design of rectified flow for input-specific adaptation and employs gradient correction to ensure coherent generation, effectively combining the adaptability of dynamic methods with the stability of static ones. RFI dynamically predicts latent-space intervention vectors while requiring only a single forward pass in LVLMs per question, achieving computational efficiency (1.09x latency overhead for 100 new tokens). Extensive experiments show RFI significantly reduces hallucinations, achieving superior performance compared to existing advanced methods, highlighting its effectiveness as a lightweight plug-and-play method for reducing LVLM's hallucination in practical applications.

RFI: Rectified Flow Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information