Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching

Huatian Zhang; Zhendong Mao; Kun Zhang; Yongdong Zhang

doi:10.1609/aaai.v36i3.20235

Authors

Huatian Zhang University of Science and Technology of China
Zhendong Mao University of Science and Technology of China
Kun Zhang University of Science and Technology of China
Yongdong Zhang University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v36i3.20235

Keywords:

Computer Vision (CV)

Abstract

Image-text matching bridges vision and language, which is a crucial task in the field of multi-modal intelligence. The key challenge lies in how to measure image-text relevance accurately as matching evidence. Most existing works aggregate the local semantic similarities of matched region-word pairs as the overall relevance, and they typically assume that the matched pairs are equally reliable. However, although a region-word pair is locally matched across modalities, it may be inconsistent/unreliable from the global perspective of image-text, resulting in inaccurate relevance measurement. In this paper, we propose a novel Cross-Modal Confidence-Aware Network to infer the matching confidence that indicates the reliability of matched region-word pairs, which is combined with the local semantic similarities to refine the relevance measurement. Specifically, we first calculate the matching confidence via the relevance between the semantic of image regions and the complete described semantic in the image, with the text as a bridge. Further, to richly express the region semantics, we extend the region to its visual context in the image. Then, local semantic similarities are weighted with the inferred confidence to filter out unreliable matched pairs in aggregating. Comprehensive experiments show that our method achieves state-of-the-art performance on benchmarks Flickr30K and MSCOCO.

Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information