I2CD: An Invertible Causal Framework for Compositional Zero-Shot Learning via Disentangle-Compose-Disentangle

Authors

  • Zhaoquan Yuan School of Computing and Artificial Intelligence, Southwest Jiaotong University Manufacturing Industry Chain Collaboration Industrial Software Key Laboratory of Sichuan Province, Chengdu, China
  • Zining Wang School of Computing and Artificial Intelligence, Southwest Jiaotong University
  • Yuankang Pan School of Computing and Artificial Intelligence, Southwest Jiaotong University
  • Ao Luo School of Computing and Artificial Intelligence, Southwest Jiaotong University
  • Wei Li School of Computing and Artificial Intelligence, Southwest Jiaotong University
  • Xiao Wu School of Computing and Artificial Intelligence, Southwest Jiaotong University
  • Changsheng Xu State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, CAS, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38221

Abstract

Compositional Zero-Shot Learning (CZSL) addresses the challenge of recognizing unseen attribute-object compositions in images, representing a fundamental challenge in artificial intelligence. Current approaches, which primarily focus on semantic alignment or distribution independence of primitives, have not achieved effective state-object decoupling and causal interventional invariance, limiting their performance on unseen compositions. To tackle this challenge, this study introduces I2CD (Invertible Causal framework via Disentangle-Compose-Disentangle), a novel framework that integrates invertible neural networks with causal intervention techniques to achieve state-object disentanglement. The framework employs a disentangle-compose-disentangle mechanism for counterfactual generation within the disentangled representation space, ensuring that modifications to one primitive (attribute or object) maintain independence from the other, thus enabling robust causal disentanglement. Representational consistency is maintained through semantic alignment between initial disentangled representations and their recomposed-then-disentangled counterparts with corresponding textual concepts. Comprehensive evaluations on three benchmark datasets—MIT-States, UT-Zappos, and C-GQA—demonstrate the framework's effectiveness in achieving both disentanglement and compositional generalization in CZSL tasks.

Downloads

Published

2026-03-14

How to Cite

Yuan, Z., Wang, Z., Pan, Y., Luo, A., Li, W., Wu, X., & Xu, C. (2026). I2CD: An Invertible Causal Framework for Compositional Zero-Shot Learning via Disentangle-Compose-Disentangle. Proceedings of the AAAI Conference on Artificial Intelligence, 40(15), 12295–12303. https://doi.org/10.1609/aaai.v40i15.38221

Issue

Section

AAAI Technical Track on Computer Vision XII