Duplex Rewards Optimization for Test-Time Composed Image Retrieval

Authors

  • Haoliang Zhou Tianjin University of Technology
  • Feifei Zhang Tianjin University of Technology
  • Changsheng Xu Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i16.38369

Abstract

Composed Image Retrieval (CIR) combines the reference image with text to retrieve the intended target image. Recently, zero-shot CIR has gained significant attention by eliminating the need for labeled triplets required in supervised CIR. However, it inevitably demands additional training corpus, storage, and computational resources, limiting its applicability in real-world scenarios. Inspired by advancements in Test-Time Adaptation (TTA), we propose a Test-Time CIR setting named TT-CIR, which aims to efficiently adapt models to unlabeled test samples while reducing resource consumption. Within the TT-CIR setting, we identify that naively introducing existing TTA methods (e.g., reward-based) into CIR faces two vital challenges: 1) Modification-restricted reward pool, which limits the exploration of semantically relevant candidate rewards; 2) Conservative knowledge feedback, which inhibits the adaptability of rewards to the current data distribution. To address these challenges, we propose a test-time reinforcement learning framework that integrates a Counterfactual-guided Multinomial Sampling (CMS) strategy and a Duplex Rewards Modeling (DRM) module. The CMS explores a candidate reward pool that is visually similar and semantically relevant to the given query, while the DRM generates stable and adaptive duplex rewards to guide model adaptation. Extensive experiments demonstrate the superiority and adaptability of our method over existing approaches.

Downloads

Published

2026-03-14

How to Cite

Zhou, H., Zhang, F., & Xu, C. (2026). Duplex Rewards Optimization for Test-Time Composed Image Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13629–13637. https://doi.org/10.1609/aaai.v40i16.38369

Issue

Section

AAAI Technical Track on Computer Vision XIII