Towards Safe Policy Learning under Partial Identifiability: A Causal Approach
DOI:
https://doi.org/10.1609/aaai.v38i12.29198Keywords:
ML: Causal Learning, ML: Reinforcement Learning, RU: Causality, RU: Graphical ModelsAbstract
Learning personalized treatment policies is a formative challenge in many real-world applications, including in healthcare, econometrics, artificial intelligence. However, the effectiveness of candidate policies is not always identifiable, i.e., it is not uniquely computable from the combination of the available data and assumptions about the generating mechanisms. This paper studies policy learning from data collected in various non-identifiable settings, i.e., (1) observational studies with unobserved confounding; (2) randomized experiments with partial observability; and (3) their combinations. We derive sharp, closed-formed bounds from observational and experimental data over the conditional treatment effects. Based on these novel bounds, we further characterize the problem of safe policy learning and develop an algorithm that trains a policy from data guaranteed to achieve, at least, the performance of the baseline policy currently deployed. Finally, we validate our proposed algorithm on synthetic data and a large clinical trial, demonstrating that it guarantees safe behaviors and robust performance.Downloads
Published
2024-03-24
How to Cite
Joshi, S., Zhang, J., & Bareinboim, E. (2024). Towards Safe Policy Learning under Partial Identifiability: A Causal Approach. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13004-13012. https://doi.org/10.1609/aaai.v38i12.29198
Issue
Section
AAAI Technical Track on Machine Learning III