Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

Huatian Zhang; Lei Zhang; Kun Zhang; Zhendong Mao

doi:10.1609/aaai.v38i7.28538

Authors

Huatian Zhang University of Science and Technology of China
Lei Zhang University of Science and Technology of China
Kun Zhang University of Science and Technology of China
Zhendong Mao University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v38i7.28538

Keywords:

CV: Language and Vision, ML: Multimodal Learning, ML: Causal Learning

Abstract

Image-text matching bridges vision and language, which is a fundamental task in multimodal intelligence. Its key challenge lies in how to capture visual-semantic relevance. Fine-grained semantic interactions come from fragment alignments between image regions and text words. However, not all fragments contribute to image-text relevance, and many existing methods are devoted to mining the vital ones to measure the relevance accurately. How well image and text relate depends on the degree of semantic sharing between them. Treating the degree as an effect and fragments as its possible causes, we define those indispensable causes for the generation of the degree as necessary undertakers, i.e., if any of them did not occur, the relevance would be no longer valid. In this paper, we revisit image-text matching in the causal view and uncover inherent causal properties of relevance generation. Then we propose a novel theoretical prototype for estimating the probability-of-necessity of fragments, PN_f, for the degree of semantic sharing by means of causal inference, and further design a Necessary Undertaker Identification Framework (NUIF) for image-text matching, which explicitly formalizes the fragment's contribution to image-text relevance by modeling PN_f in two ways. Extensive experiments show our method achieves state-of-the-art on benchmarks Flickr30K and MSCOCO.

Identification of Necessary Semantic Undertakers in the Causal View for Image-Text Matching

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information