Causal Discovery from Multiple Data Sets with Non-Identical Variable Sets

Biwei Huang; Kun Zhang; Mingming Gong; Clark Glymour

doi:10.1609/aaai.v34i06.6575

Authors

Biwei Huang Carnegie Mellon University
Kun Zhang Carnegie Mellon University
Mingming Gong University of Melbourne
Clark Glymour Carnegie Mellon University

DOI:

https://doi.org/10.1609/aaai.v34i06.6575

Abstract

A number of approaches to causal discovery assume that there are no hidden confounders and are designed to learn a fixed causal model from a single data set. Over the last decade, with closer cooperation across laboratories, we are able to accumulate more variables and data for analysis, while each lab may only measure a subset of them, due to technical constraints or to save time and cost. This raises a question of how to handle causal discovery from multiple data sets with non-identical variable sets, and at the same time, it would be interesting to see how more recorded variables can help to mitigate the confounding problem. In this paper, we propose a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases. The proposed method also allows distribution shifts across data sets. Theoretically, we show that the causal structure over the integrated set of variables is identifiable under testable conditions. Furthermore, we present two types of approaches to parameter estimation: one is based on maximum likelihood, and the other is likelihood free and leverages generative adversarial nets to improve scalability of the estimation procedure. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.

Causal Discovery from Multiple Data Sets with Non-Identical Variable Sets

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription