DiCA: Disambiguated Contrastive Alignment for Cross-Modal Retrieval with Partial Labels

Authors

  • Chao Su The College of Computer Science, Sichuan University, Chengdu, China
  • Huiming Zheng Sichuan National Innovation New Vision UHD Video Technology Co., Ltd., Chengdu, China
  • Dezhong Peng The College of Computer Science, Sichuan University, Chengdu, China, Sichuan National Innovation New Vision UHD Video Technology Co., Ltd., Chengdu, China
  • Xu Wang The College of Computer Science, Sichuan University, Chengdu, China

DOI:

https://doi.org/10.1609/aaai.v39i19.34271

Abstract

Cross-modal retrieval aims to retrieve relevant data across different modalities. Driven by costly massive labeled data, existing cross-modal retrieval methods achieve encouraging results. To reduce annotation costs while maintaining performance, this paper focuses on an untouched but challenging problem, i.e., cross-modal retrieval with partial labels (PLCMR). PLCMR faces the dual challenges of annotation ambiguity and modality gap. To address these challenges, we propose a novel method termed disambiguated contrastive alignment (DiCA) for cross-modal retrieval with partial labels. Specifically, DiCA proposes a novel non-candidate boosted disambiguation learning mechanism (NBDL), which elaborately balances the trade-off between the losses on candidate and non-candidate labels that eliminate label ambiguity and narrow the modality gap. Moreover, DiCA presents an instance-prototype representation learning mechanism (IPRL) to enhance the model by further eliminating the modality gap at both the instance and prototype levels. Thanks to NBDL and IPRL, our DiCA effectively addresses the issues of annotation ambiguity and modality gap for cross-modal retrieval with partial labels. Experiments on four benchmarks validate the effectiveness of our proposed method, which demonstrates enhanced performance over existing state-of-the-art methods.

Downloads

Published

2025-04-11

How to Cite

Su, C., Zheng, H., Peng, D., & Wang, X. (2025). DiCA: Disambiguated Contrastive Alignment for Cross-Modal Retrieval with Partial Labels. Proceedings of the AAAI Conference on Artificial Intelligence, 39(19), 20610-20618. https://doi.org/10.1609/aaai.v39i19.34271

Issue

Section

AAAI Technical Track on Machine Learning V