Adaptive Graph Attention Based Discrete Hashing for Incomplete Cross-modal Retrieval

Authors

  • Shuang Zhang Hebei Normal University
  • Yue Wu Hebei Normal University
  • Lei Shi Communication University of China
  • Huilong Jin Hebei Normal University
  • Feifei Kou Beijing University of Posts and Telecommunications
  • Pengfei Zhang Anhui University Of Science & Technology
  • Mingying Xu North China University of Technology
  • Pengtao Lv Henan University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i33.40067

Abstract

Cross-modal hashing has emerged as a pivotal solution for efficient retrieval across diverse modalities, such as images and texts, by mapping them into compact binary hash spaces. However, in real-world scenarios, the modalities data is often missing or misaligned. Existing methods are most rely on fully paired training data and ignore missing or misaligned modalities data, resulting in the semantic inconsistencies. To address these challenges, we propose an Adaptive Graph Attention-Based Discrete Hashing (AGADH) method, which consists of three parts. First, to solve the problem of missing modalities, AGADH employs a masked completion strategy to reconstruct missing modalities. Second, to mitigate semantic misalignment, AGADH leverages a Graph Attention Network (GAT) encoder-decoder architecture with alignment module to construct features from different modalities. Additionally, to enhance the fusion performance, an adaptive fusion module dynamically adjusting the contributions of image and text modalities with learnable weighting coefficients is proposed. Extensive experiments on three benchmark datasets, MS-COCO, NUS-WIDE, and MIRFlickr-25K, demonstrating that AGADH outperforms state-of-the-art methods in both fully paired and incompletely paired scenarios, showing its robustness and effectiveness in cross-modal retrieval tasks.

Published

2026-03-14

How to Cite

Zhang, S., Wu, Y., Shi, L., Jin, H., Kou, F., Zhang, P., … Lv, P. (2026). Adaptive Graph Attention Based Discrete Hashing for Incomplete Cross-modal Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(33), 28382–28390. https://doi.org/10.1609/aaai.v40i33.40067

Issue

Section

AAAI Technical Track on Machine Learning X