Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

Authors

  • Haomiao Tang Tsinghua University
  • Jinpeng Wang Harbin Institute of Technology, Shenzhen
  • Minyi Zhao Fudan University
  • GuangHao Meng Tsinghua University
  • Ruisheng Luo Tsinghua University
  • Long Chen The Hong Kong University of Science and Technology
  • Shu-Tao Xia Tsinghua University

DOI:

https://doi.org/10.1609/aaai.v40i11.37898

Abstract

Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens model's robustness. Probabilistic learning approaches have shown promise in addressing such issues; however, they fall short for CIR due to their instance-level holistic modeling and homogeneous treatments for queries and targets. This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations. HUG utilizes a fine-grained probabilistic learning framework, where queries and targets are represented by Gaussian embeddings capturing detailed concepts and uncertainties. We customize heterogeneous uncertainty estimations for multi-modal queries and uni-modal targets. Given a query, we capture uncertainties not only regarding uni-modal content quality but also multi-modal coordination, followed by a provable dynamic weighting mechanism to derive the comprehensive query uncertainty. We further design uncertainty-guided objectives, including query-target holistic contrast and fine-grained contrasts with comprehensive negative sampling strategies, which effectively enhance discriminative learning. Experiments on benchmarks demonstrate HUG's effectiveness beyond state-of-the-art baselines, with faithful analysis justifying the technical contributions.

Published

2026-03-14

How to Cite

Tang, H., Wang, J., Zhao, M., Meng, G., Luo, R., Chen, L., & Xia, S.-T. (2026). Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 9386-9394. https://doi.org/10.1609/aaai.v40i11.37898

Issue

Section

AAAI Technical Track on Computer Vision VIII