Towards Safe Policy Learning under Partial Identifiability: A Causal Approach

Authors

  • Shalmali Joshi Columbia University
  • Junzhe Zhang Columbia University
  • Elias Bareinboim Columbia University

DOI:

https://doi.org/10.1609/aaai.v38i12.29198

Keywords:

ML: Causal Learning, ML: Reinforcement Learning, RU: Causality, RU: Graphical Models

Abstract

Learning personalized treatment policies is a formative challenge in many real-world applications, including in healthcare, econometrics, artificial intelligence. However, the effectiveness of candidate policies is not always identifiable, i.e., it is not uniquely computable from the combination of the available data and assumptions about the generating mechanisms. This paper studies policy learning from data collected in various non-identifiable settings, i.e., (1) observational studies with unobserved confounding; (2) randomized experiments with partial observability; and (3) their combinations. We derive sharp, closed-formed bounds from observational and experimental data over the conditional treatment effects. Based on these novel bounds, we further characterize the problem of safe policy learning and develop an algorithm that trains a policy from data guaranteed to achieve, at least, the performance of the baseline policy currently deployed. Finally, we validate our proposed algorithm on synthetic data and a large clinical trial, demonstrating that it guarantees safe behaviors and robust performance.

Published

2024-03-24

How to Cite

Joshi, S., Zhang, J., & Bareinboim, E. (2024). Towards Safe Policy Learning under Partial Identifiability: A Causal Approach. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13004-13012. https://doi.org/10.1609/aaai.v38i12.29198

Issue

Section

AAAI Technical Track on Machine Learning III