Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning

Authors

  • Chenchen Jing Zhejiang University
  • Yukun Li Northwestern Polytechnical University
  • Hao Chen Zhejiang University
  • Chunhua Shen Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v38i3.28043

Keywords:

CV: Language and Vision, CV: Image and Video Retrieval

Abstract

Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by learning from seen compositions. Composing the learned knowledge of seen primitives, i.e., attributes or objects, into novel compositions is critical for CZSL. In this work, we propose to explicitly retrieve knowledge of seen primitives for compositional zero-shot learning. We present a retrieval-augmented method, which augments standard multi-path classification methods with two retrieval modules. Specifically, we construct two databases storing the attribute and object representations of training images, respectively. For an input training/testing image, we use two retrieval modules to retrieve representations of training images with the same attribute and object, respectively. The primitive representations of the input image are augmented by using the retrieved representations, for composition recognition. By referencing semantically similar images, the proposed method is capable of recalling knowledge of seen primitives for compositional generalization. Experiments on three widely-used datasets show the effectiveness of the proposed method.

Published

2024-03-24

How to Cite

Jing, C., Li, Y., Chen, H., & Shen, C. (2024). Retrieval-Augmented Primitive Representations for Compositional Zero-Shot Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2652-2660. https://doi.org/10.1609/aaai.v38i3.28043

Issue

Section

AAAI Technical Track on Computer Vision II