Identifying Selection Bias from Observational Data

Authors

  • David Kaltenpoth CISPA Helmholtz Center for Information Security
  • Jilles Vreeken CISPA Helmholtz Center for Information Security

DOI:

https://doi.org/10.1609/aaai.v37i7.25987

Keywords:

ML: Causal Learning, RU: Causality

Abstract

Access to a representative sample from the population is an assumption that underpins all of machine learning. Selection effects can cause observations to instead come from a subpopulation, by which our inferences may be subject to bias. It is therefore important to know whether or not a sample is affected by selection effects. We study under which conditions we can identify selection bias and give results for both parametric and non-parametric families of distributions. Based on these results we develop two practical methods to determine whether or not an observed sample comes from a distribution subject to selection bias. Through extensive evaluation on synthetic and real world data we verify that our methods beat the state of the art both in detecting as well as characterizing selection bias.

Downloads

Published

2023-06-26

How to Cite

Kaltenpoth, D., & Vreeken, J. (2023). Identifying Selection Bias from Observational Data. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 8177-8185. https://doi.org/10.1609/aaai.v37i7.25987

Issue

Section

AAAI Technical Track on Machine Learning II