Deconfounding Physical Dynamics with Global Causal Relation and Confounder Transmission for Counterfactual Prediction

Zongzhao Li; Xiangyu Zhu; Zhen Lei; Zhaoxiang Zhang

doi:10.1609/aaai.v36i2.20044

Authors

Zongzhao Li NLPR & CBSR, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
Xiangyu Zhu NLPR & CBSR, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences
Zhen Lei NLPR & CBSR, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences
Zhaoxiang Zhang NLPR & CBSR, Institute of Automation, Chinese Academy of Sciences School of Artificial Intelligence, University of Chinese Academy of Sciences Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science & Innovation, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v36i2.20044

Keywords:

Computer Vision (CV), Reasoning Under Uncertainty (RU)

Abstract

Discovering the underneath causal relations is the fundamental ability for reasoning about the surrounding environment and predicting the future states in the physical world. Counterfactual prediction from visual input, which requires simulating future states based on unrealized situations in the past, is a vital component in causal relation tasks. In this paper, we work on the confounders that have effect on the physical dynamics, including masses, friction coefficients, etc., to bridge relations between the intervened variable and the affected variable whose future state may be altered. We propose a neural network framework combining Global Causal Relation Attention (GCRA) and Confounder Transmission Structure (CTS). The GCRA looks for the latent causal relations between different variables and estimates the confounders by capturing both spatial and temporal information. The CTS integrates and transmits the learnt confounders in a residual way, so that the estimated confounders can be encoded into the network as a constraint for object positions when performing counterfactual prediction. Without any access to ground truth information about confounders, our model outperforms the state-of-the-art method on various benchmarks by fully utilizing the constraints of confounders. Extensive experiments demonstrate that our model can generalize to unseen environments and maintain good performance.

Deconfounding Physical Dynamics with Global Causal Relation and Confounder Transmission for Counterfactual Prediction

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information