Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval
Keywords:Language and Vision, Image and Video Retrieval
AbstractCross-modal retrieval has become an active study field with the expanding scale of multimodal data. To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities. However, these methods typically suffer from following limitations: 1) They usually attempt to bridge the modality gap by designing losses in the common representation space which may not be sufficient to eliminate potential heterogeneity of different modalities in the common space. 2) They typically treat labels as independent individuals and ignore label relationships which are important for constructing semantic links between multimodal data. In this work, we propose a novel Dual Adversarial Graph Neural Networks (DAGNN) composed of the dual generative adversarial networks and the multi-hop graph neural networks, which learn modality-invariant and discriminative common representations for cross-modal retrieval. Firstly, we construct the dual generative adversarial networks to project multimodal data into a common representation space. Secondly, we leverage the multi-hop graph neural networks, in which a layer aggregation mechanism is proposed to exploit multi-hop propagation information, to capture the label correlation dependency and learn inter-dependent classifiers. Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.
How to Cite
Qian, S., Xue, D., Zhang, H., Fang, Q., & Xu, C. (2021). Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 2440-2448. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16345
AAAI Technical Track on Computer Vision II