Polysemic Semantic Instance Network for Cross-Modal Hashing

Authors

  • Shuo Han Qufu Normal University
  • Qibing Qin Weifang University
  • Kezhen Xie Qingdao University of Technology
  • Wenfeng Zhang Chongqing Normal University
  • Lei Huang Ocean University of China

DOI:

https://doi.org/10.1609/aaai.v40i6.42459

Abstract

Hashing techniques are widely adopted in large-scale cross-modal retrieval due to their efficiency and low storage cost. However, semantic ambiguities, including polysemy, multi-object images, and missing semantic descriptions, significantly degrade the accuracy of alignment and retrieval performance. Most existing methods rely on one-to-one mappings that preserve only global average semantics, which fail to capture the intrinsic polysemous structures embedded within individual samples. To address this issue, we propose a novel Deep Polysemic Semantic Instance Hashing (DPSIH) method and design a Diverse Semantic Instance Embedding (DSIE) module. This module integrates local and global features through multi-head self-attention and residual learning, generating multiple diverse embeddings per sample to effectively capture fine-grained and polysemous semantic structures. Furthermore, we design a multi-embedding semantic correlation constraint that relaxes strict alignment restrictions to improve robustness under partial alignment, and introduce Maximum Mean Discrepancy (MMD) regularization to alleviate cross-modal distribution shifts. Additionally, an embedding diversity mechanism is proposed to prevent all embeddings from collapsing into a central or averaged representation, thereby enhancing semantic diversity. Extensive experiments on four benchmark datasets demonstrate that DPSIH significantly outperforms state-of-the-art methods and effectively improves the modeling of semantic ambiguity in cross-modal retrieval tasks.

Downloads

Published

2026-03-14

How to Cite

Han, S., Qin, Q., Xie, K., Zhang, W., & Huang, L. (2026). Polysemic Semantic Instance Network for Cross-Modal Hashing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4592–4600. https://doi.org/10.1609/aaai.v40i6.42459

Issue

Section

AAAI Technical Track on Computer Vision III