Leveraging Sub-class Discimination for Compositional Zero-Shot Learning

Authors

  • Xiaoming Hu University of Science and Technology of China
  • Zilei Wang University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v37i1.25168

Keywords:

CV: Object Detection & Categorization, CV: Applications, ML: Transfer, Domain Adaptation, Multi-Task Learning

Abstract

Compositional Zero-Shot Learning (CZSL) aims at identifying unseen compositions composed of previously seen attributes and objects during the test phase. In real images, the visual appearances of attributes and objects (primitive concepts) generally interact with each other. Namely, the visual appearances of an attribute may change when composed with different objects, and vice versa. But previous works overlook this important property. In this paper, we introduce a simple yet effective approach with leveraging sub-class discrimination. Specifically, we define the primitive concepts in different compositions as sub-classes, and then maintain the sub-class discrimination to address the above challenge. More specifically, inspired by the observation that the composed recognition models could account for the differences across sub-classes, we first propose to impose the embedding alignment between the composed and disentangled recognition to incorporate sub-class discrimination at the feature level. Then we develop the prototype modulator networks to adjust the class prototypes w.r.t. the composition information, which can enhance sub-class discrimination at the classifier level. We conduct extensive experiments on the challenging benchmark datasets, and the considerable performance improvement over state-of-the-art approaches is achieved, which indicates the effectiveness of our method. Our code is available at https://github.com/hxm97/SCD-CZSL.

Downloads

Published

2023-06-26

How to Cite

Hu, X., & Wang, Z. (2023). Leveraging Sub-class Discimination for Compositional Zero-Shot Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 890-898. https://doi.org/10.1609/aaai.v37i1.25168

Issue

Section

AAAI Technical Track on Computer Vision I