Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval

Authors

  • Shengsheng Qian National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Dizhan Xue University of Chinese Academy of Sciences
  • Huaiwen Zhang National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Quan Fang National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences
  • Changsheng Xu National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences University of Chinese Academy of Sciences Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v35i3.16345

Keywords:

Language and Vision, Image and Video Retrieval

Abstract

Cross-modal retrieval has become an active study field with the expanding scale of multimodal data. To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities. However, these methods typically suffer from following limitations: 1) They usually attempt to bridge the modality gap by designing losses in the common representation space which may not be sufficient to eliminate potential heterogeneity of different modalities in the common space. 2) They typically treat labels as independent individuals and ignore label relationships which are important for constructing semantic links between multimodal data. In this work, we propose a novel Dual Adversarial Graph Neural Networks (DAGNN) composed of the dual generative adversarial networks and the multi-hop graph neural networks, which learn modality-invariant and discriminative common representations for cross-modal retrieval. Firstly, we construct the dual generative adversarial networks to project multimodal data into a common representation space. Secondly, we leverage the multi-hop graph neural networks, in which a layer aggregation mechanism is proposed to exploit multi-hop propagation information, to capture the label correlation dependency and learn inter-dependent classifiers. Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.

Downloads

Published

2021-05-18

How to Cite

Qian, S., Xue, D., Zhang, H., Fang, Q., & Xu, C. (2021). Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 2440-2448. https://doi.org/10.1609/aaai.v35i3.16345

Issue

Section

AAAI Technical Track on Computer Vision II