Task-Independent Knowledge Makes for Transferable Representations for Generalized Zero-Shot Learning

Authors

  • Chaoqun Wang School of Data Science, University of Science and Technology of China, Hefei, Anhui, China The National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei, Anhui, China
  • Xuejin Chen School of Data Science, University of Science and Technology of China, Hefei, Anhui, China The National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei, Anhui, China
  • Shaobo Min The National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei, Anhui, China
  • Xiaoyan Sun The National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei, Anhui, China
  • Houqiang Li School of Data Science, University of Science and Technology of China, Hefei, Anhui, China The National Engineering Laboratory for Brain-inspired Intelligence Technology and Application, University of Science and Technology of China, Hefei, Anhui, China

DOI:

https://doi.org/10.1609/aaai.v35i3.16375

Keywords:

Object Detection & Categorization

Abstract

Generalized Zero-Shot Learning (GZSL) targets recognizing new categories by learning transferable image representations. Existing methods find that, by aligning image representations with corresponding semantic labels, the semantic-aligned representations can be transferred to unseen categories. However, supervised by only seen category labels, the learned semantic knowledge is highly task-specific, which makes image representations biased towards seen categories. In this paper, we propose a novel Dual-Contrastive Embedding Network (DCEN) that simultaneously learns task-specific and task-independent knowledge via semantic alignment and instance discrimination. First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity. Besides task-specific knowledge, DCEN then introduces task-independent knowledge by attracting representations of different views of the same image and repelling representations of different images. Compared to high-level seen category supervision, this instance discrimination supervision encourages DCEN to capture low-level visual knowledge, which is less biased toward seen categories and alleviates the representation bias. Consequently, the task-specific and task-independent knowledge jointly make for transferable representations of DCEN, which obtains averaged 4.1% improvement on four public benchmarks.

Downloads

Published

2021-05-18

How to Cite

Wang, C., Chen, X., Min, S., Sun, X., & Li, H. (2021). Task-Independent Knowledge Makes for Transferable Representations for Generalized Zero-Shot Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(3), 2710-2718. https://doi.org/10.1609/aaai.v35i3.16375

Issue

Section

AAAI Technical Track on Computer Vision II