RFI: Rectified Flow Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Authors

  • Junyu Cheng Department of Artificial Intelligence, School of Informatics, Xiamen University, China
  • Zhibiao Liang School of Computer Science, South China Normal University, China
  • Yidong Chen Department of Artificial Intelligence, School of Informatics, Xiamen University, China Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China
  • Shuangyin Li School of Computer Science, South China Normal University, China

DOI:

https://doi.org/10.1609/aaai.v40i5.37320

Abstract

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in multimodal understanding and generation by integrating visual and textual data. However, these models frequently exhibit object hallucination problems: generating outputs that are inconsistent with the input image. Existing improved methods for mitigating hallucinations still suffer from two key limitations: dynamic approaches based on logits or attention mechanisms risk suppressing valuable linguistic priors, whereas static methods that employ fixed intervention vectors lack the flexibility to adapt to diverse images and questions. To address these issues, we propose RFI (Rectified Flow Intervention), a novel approach that harnesses the linear trajectory design of rectified flow for input-specific adaptation and employs gradient correction to ensure coherent generation, effectively combining the adaptability of dynamic methods with the stability of static ones. RFI dynamically predicts latent-space intervention vectors while requiring only a single forward pass in LVLMs per question, achieving computational efficiency (1.09x latency overhead for 100 new tokens). Extensive experiments show RFI significantly reduces hallucinations, achieving superior performance compared to existing advanced methods, highlighting its effectiveness as a lightweight plug-and-play method for reducing LVLM's hallucination in practical applications.

Published

2026-03-14

How to Cite

Cheng, J., Liang, Z., Chen, Y., & Li, S. (2026). RFI: Rectified Flow Intervention for Mitigating Object Hallucination in Large Vision-Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3255–3263. https://doi.org/10.1609/aaai.v40i5.37320

Issue

Section

AAAI Technical Track on Computer Vision II