Composition-Incremental Learning for Compositional Generalization

Zhen Li; Yuwei Wu; Chenchen Jing; Che Sun; Chuanhao Li; Yunde Jia

doi:10.1609/aaai.v40i8.37605

Authors

Zhen Li Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
Yuwei Wu Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
Chenchen Jing Zhejiang University of Technology
Che Sun Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
Chuanhao Li Shanghai AI Laboratory Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
Yunde Jia Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i8.37605

Abstract

Compositional generalization has achieved substantial progress in computer vision on pre-collected training data. Nonetheless, real-world data continually emerges, with possible compositions being nearly infinite, long-tailed, and not entirely visible. Thus, an ideal model is supposed to gradually improve the capability of compositional generalization in an incremental manner. In this paper, we explore Composition-Incremental Learning for Compositional Generalization (CompIL) in the context of the compositional zero-shot learning (CZSL) task, where models need to continually learn new compositions, intending to improve their compositional generalization capability progressively. To quantitatively evaluate CompIL, we develop a benchmark construction pipeline leveraging existing datasets, yielding MIT-States-CompIL and C-GQA-CompIL. Furthermore, we propose a pseudo-replay framework utilizing a visual synthesizer to synthesize visual representations of learned compositions and a linguistic primitive distillation mechanism to maintain aligned primitive representations across the learning process. Extensive experiments demonstrate the effectiveness of the proposed framework.

Composition-Incremental Learning for Compositional Generalization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information