Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search
DOI:
https://doi.org/10.1609/aaai.v38i12.29280Keywords:
ML: Multimodal Learning, CV: Language and VisionAbstract
Deep cross-modal hashing technology provides an effective and efficient cross-modal unified representation learning solution for cross-modal search. However, the existing methods neglect the implicit fine-grained multimodal knowledge relations between these modalities such as when the image contains information that is not directly described in the text. To tackle this problem, we propose a novel self-supervised multi-grained multi-modal knowledge graph contrastive hashing method for cross-modal search (CMGCH). Firstly, in order to capture implicit fine-grained cross-modal semantic associations, a multi-modal knowledge graph is constructed, which represents the implicit multimodal knowledge relations between the image and text as inter-modal and intra-modal semantic associations. Secondly, a cross-modal graph contrastive attention network is proposed to reason on the multi-modal knowledge graph to sufficiently learn the implicit fine-grained inter-modal and intra-modal knowledge relations. Thirdly, a cross-modal multi-granularity contrastive embedding learning mechanism is proposed, which fuses the global coarse-grained and local fine-grained embeddings by multihead attention mechanism for inter-modal and intra-modal contrastive learning, so as to enhance the cross-modal unified representations with stronger discriminativeness and semantic consistency preserving power. With the joint training of intra-modal and inter-modal contrast, the invariant and modal-specific information of different modalities can be maintained in the final unified cross-modal unified hash space. Extensive experiments on several cross-modal benchmark datasets demonstrate that the proposed CMGCH outperforms the state-of the-art methods.Downloads
Published
2024-03-24
How to Cite
Liang, M., Du, J., Liang, Z., Xing, Y., Huang, W., & Xue, Z. (2024). Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13744-13753. https://doi.org/10.1609/aaai.v38i12.29280
Issue
Section
AAAI Technical Track on Machine Learning III