I2CD: An Invertible Causal Framework for Compositional Zero-Shot Learning via Disentangle-Compose-Disentangle

Zhaoquan Yuan; Zining Wang; Yuankang Pan; Ao Luo; Wei Li; Xiao Wu; Changsheng Xu

doi:10.1609/aaai.v40i15.38221

Authors

Zhaoquan Yuan School of Computing and Artificial Intelligence, Southwest Jiaotong University Manufacturing Industry Chain Collaboration Industrial Software Key Laboratory of Sichuan Province, Chengdu, China
Zining Wang School of Computing and Artificial Intelligence, Southwest Jiaotong University
Yuankang Pan School of Computing and Artificial Intelligence, Southwest Jiaotong University
Ao Luo School of Computing and Artificial Intelligence, Southwest Jiaotong University
Wei Li School of Computing and Artificial Intelligence, Southwest Jiaotong University
Xiao Wu School of Computing and Artificial Intelligence, Southwest Jiaotong University
Changsheng Xu State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, CAS, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38221

Abstract

Compositional Zero-Shot Learning (CZSL) addresses the challenge of recognizing unseen attribute-object compositions in images, representing a fundamental challenge in artificial intelligence. Current approaches, which primarily focus on semantic alignment or distribution independence of primitives, have not achieved effective state-object decoupling and causal interventional invariance, limiting their performance on unseen compositions. To tackle this challenge, this study introduces I2CD (Invertible Causal framework via Disentangle-Compose-Disentangle), a novel framework that integrates invertible neural networks with causal intervention techniques to achieve state-object disentanglement. The framework employs a disentangle-compose-disentangle mechanism for counterfactual generation within the disentangled representation space, ensuring that modifications to one primitive (attribute or object) maintain independence from the other, thus enabling robust causal disentanglement. Representational consistency is maintained through semantic alignment between initial disentangled representations and their recomposed-then-disentangled counterparts with corresponding textual concepts. Comprehensive evaluations on three benchmark datasets—MIT-States, UT-Zappos, and C-GQA—demonstrate the framework's effectiveness in achieving both disentanglement and compositional generalization in CZSL tasks.

I2CD: An Invertible Causal Framework for Compositional Zero-Shot Learning via Disentangle-Compose-Disentangle

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information