Composition-Incremental Learning for Compositional Generalization

Authors

  • Zhen Li Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
  • Yuwei Wu Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
  • Chenchen Jing Zhejiang University of Technology
  • Che Sun Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
  • Chuanhao Li Shanghai AI Laboratory Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University
  • Yunde Jia Guangdong Laboratory of Machine Perception and Intelligent Computing, Shenzhen MSU-BIT University Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science & Technology, Beijing Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v40i8.37605

Abstract

Compositional generalization has achieved substantial progress in computer vision on pre-collected training data. Nonetheless, real-world data continually emerges, with possible compositions being nearly infinite, long-tailed, and not entirely visible. Thus, an ideal model is supposed to gradually improve the capability of compositional generalization in an incremental manner. In this paper, we explore Composition-Incremental Learning for Compositional Generalization (CompIL) in the context of the compositional zero-shot learning (CZSL) task, where models need to continually learn new compositions, intending to improve their compositional generalization capability progressively. To quantitatively evaluate CompIL, we develop a benchmark construction pipeline leveraging existing datasets, yielding MIT-States-CompIL and C-GQA-CompIL. Furthermore, we propose a pseudo-replay framework utilizing a visual synthesizer to synthesize visual representations of learned compositions and a linguistic primitive distillation mechanism to maintain aligned primitive representations across the learning process. Extensive experiments demonstrate the effectiveness of the proposed framework.

Downloads

Published

2026-03-14

How to Cite

Li, Z., Wu, Y., Jing, C., Sun, C., Li, C., & Jia, Y. (2026). Composition-Incremental Learning for Compositional Generalization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6735–6743. https://doi.org/10.1609/aaai.v40i8.37605

Issue

Section

AAAI Technical Track on Computer Vision V