HaNa: Hardness and Noise-Aware Robust Cross-modal Retrieval

Authors

  • Fangming Zhong Dalian University of Technology
  • Haiquan Yu Dalian University of Technology
  • Cun Zhu Dalian University of Technology
  • Suhua Zhang Dalian University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i19.38689

Abstract

Noisy correspondence in cross-modal retrieval introduces significant challenges due to its inherent difficulty in identification and correction. Although existing methods attempt to minimize the influence of noisy samples by the weighting mechanism, these methods still struggle with performance degradation under increasing noise levels. Specifically, the clean samples are assigned the same weight of 1, which ignores the sample hardness. In addition, the weights for noisy samples are approaching 0, leading to the overlook of sample diversity. To address these issues, we propose a Hardness and Noise-aware (HaNa) robust cross-modal retrieval method. HaNa introduces a momentum-based reweighting mechanism to adaptively balance learning difficulty across clean samples, avoiding overfitting risk and accumulative partitioning bias. Moreover, HaNa addresses the limitation that weights for noisy data are approaching 0 from a new perspective to fully employ the diversity of samples to further improve its generalization. It employs an Asymmetric Noise-aware Regularization Loss (ANRL) to treat identified noisy data as negative samples for optimization. Extensive experiments demonstrate that HaNa achieves superior matching accuracy and stability, especially in high-noise scenarios, outperforming state-of-the-art methods.

Published

2026-03-14

How to Cite

Zhong, F., Yu, H., Zhu, C., & Zhang, S. (2026). HaNa: Hardness and Noise-Aware Robust Cross-modal Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16495–16503. https://doi.org/10.1609/aaai.v40i19.38689

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III