Multi-Modal Knowledge Hypergraph for Diverse Image Retrieval

Authors

  • Yawen Zeng ByteDance AI Lab
  • Qin Jin Renmin University of China
  • Tengfei Bao ByteDance AI Lab
  • Wenfeng Li ByteDance AI Lab

DOI:

https://doi.org/10.1609/aaai.v37i3.25445

Keywords:

CV: Image and Video Retrieval, DMKM: Mining of Visual, Multimedia & Multimodal Data, DMKM: Web Search & Information Retrieval

Abstract

The task of keyword-based diverse image retrieval has received considerable attention due to its wide demand in real-world scenarios. Existing methods either rely on a multi-stage re-ranking strategy based on human design to diversify results, or extend sub-semantics via an implicit generator, which either relies on manual labor or lacks explainability. To learn more diverse and explainable representations, we capture sub-semantics in an explicit manner by leveraging the multi-modal knowledge graph (MMKG) that contains richer entities and relations. However, the huge domain gap between the off-the-shelf MMKG and retrieval datasets, as well as the semantic gap between images and texts, make the fusion of MMKG difficult. In this paper, we pioneer a degree-free hypergraph solution that models many-to-many relations to address the challenge of heterogeneous sources and heterogeneous modalities. Specifically, a hyperlink-based solution, Multi-Modal Knowledge Hyper Graph (MKHG) is proposed, which bridges heterogeneous data via various hyperlinks to diversify sub-semantics. Among them, a hypergraph construction module first customizes various hyperedges to link the heterogeneous MMKG and retrieval databases. A multi-modal instance bagging module then explicitly selects instances to diversify the semantics. Meanwhile, a diverse concept aggregator flexibly adapts key sub-semantics. Finally, several losses are adopted to optimize the semantic space. Extensive experiments on two real-world datasets have well verified the effectiveness and explainability of our proposed method.

Downloads

Published

2023-06-26

How to Cite

Zeng, Y., Jin, Q., Bao, T., & Li, W. (2023). Multi-Modal Knowledge Hypergraph for Diverse Image Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 3376-3383. https://doi.org/10.1609/aaai.v37i3.25445

Issue

Section

AAAI Technical Track on Computer Vision III