Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval

Shengsheng Qian; Dizhan Xue; Huaiwen Zhang; Quan Fang; Changsheng Xu

doi:10.1609/aaai.v35i3.16345

Authors

Shengsheng Qian National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
Dizhan Xue University of Chinese Academy of Sciences
Huaiwen Zhang National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
Quan Fang National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
Changsheng Xu National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v35i3.16345

Keywords:

Language and Vision, Image and Video Retrieval

Abstract

Cross-modal retrieval has become an active study field with the expanding scale of multimodal data. To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities. However, these methods typically suffer from following limitations: 1) They usually attempt to bridge the modality gap by designing losses in the common representation space which may not be sufficient to eliminate potential heterogeneity of different modalities in the common space. 2) They typically treat labels as independent individuals and ignore label relationships which are important for constructing semantic links between multimodal data. In this work, we propose a novel Dual Adversarial Graph Neural Networks (DAGNN) composed of the dual generative adversarial networks and the multi-hop graph neural networks, which learn modality-invariant and discriminative common representations for cross-modal retrieval. Firstly, we construct the dual generative adversarial networks to project multimodal data into a common representation space. Secondly, we leverage the multi-hop graph neural networks, in which a layer aggregation mechanism is proposed to exploit multi-hop propagation information, to capture the label correlation dependency and learn inter-dependent classifiers. Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.

Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information