TY - JOUR AU - Yang, Fan AU - Wang, Zheng AU - Xiao, Jing AU - Satoh, Shin'ichi PY - 2020/04/03 Y2 - 2024/03/28 TI - Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 34 IS - 07 SE - AAAI Technical Track: Vision DO - 10.1609/aaai.v34i07.6949 UR - https://ojs.aaai.org/index.php/AAAI/article/view/6949 SP - 12589-12596 AB - <p>Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal <em>v.s.</em> visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval</p> ER -