Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization

Authors

  • Yue Zhang The University of Texas at Dallas
  • Liqiang Jing The University of Texas at Dallas
  • Vibhav Gogate The University of Texas at Dallas

DOI:

https://doi.org/10.1609/aaai.v39i24.34792

Abstract

We introduce a new task called Defeasible Visual Entailment (DVE), where the goal is to allow the modification of the entailment relationship between an image premise and a text hypothesis based on an additional update. While this concept is well-established in Natural Language Inference, it remains unexplored in visual entailment. At a high level, DVE enables models to refine their initial interpretations, leading to improved accuracy and reliability in various applications such as detecting misleading information in images, enhancing visual question answering, and refining decision-making processes in autonomous systems. Existing metrics do not adequately capture the change in the entailment relationship brought by updates. To address this, we propose a novel inference-aware evaluator designed to capture changes in entailment strength induced by updates, using pairwise contrastive learning and categorical information learning. Additionally, we introduce a reward-driven update optimization method to further enhance the quality of updates generated by multimodal models. Experimental results demonstrate the effectiveness of our proposed evaluator and optimization method.

Downloads

Published

2025-04-11

How to Cite

Zhang, Y., Jing, L., & Gogate, V. (2025). Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 25976–25984. https://doi.org/10.1609/aaai.v39i24.34792

Issue

Section

AAAI Technical Track on Natural Language Processing III