Imbalance-Aware Uplift Modeling for Observational Data
Keywords:Machine Learning (ML), Data Mining & Knowledge Management (DMKM)
AbstractUplift modeling aims to model the incremental impact of a treatment on an individual outcome, which has attracted great interests of researchers and practitioners from different communities. Existing uplift modeling methods rely on either the data collected from randomized controlled trials (RCTs) or the observational data which is more realistic. However, we notice that on the observational data, it is often the case that only a small number of subjects receive treatment, but finally infer the uplift on a much large group of subjects. Such highly imbalanced data is common in various fields such as marketing and medical treatment but it is rarely handled by existing works. In this paper, we theoretically and quantitatively prove that the existing representative methods, transformed outcome (TOM) and doubly robust (DR), suffer from large bias and deviation on highly imbalanced datasets with skewed propensity scores, mainly because they are proportional to the reciprocal of the propensity score. To reduce the bias and deviation of uplift modeling with an imbalanced dataset, we propose an imbalance-aware uplift modeling (IAUM) method via constructing a robust proxy outcome, which adaptively combines the doubly robust estimator and the imputed treatment effects based on the propensity score. We theoretically prove that IAUM can obtain a better bias-variance trade-off than existing methods on a highly imbalanced dataset. We conduct extensive experiments on a synthetic dataset and two real-world datasets, and the experimental results well demonstrate the superiority of our method over state-of-the-art.
How to Cite
Chen, X., Liu, Z., Yu, L., Yao, L., Zhang, W., Dong, Y., Gu, L., Zeng, X., Tan, Y., & Gu, J. (2022). Imbalance-Aware Uplift Modeling for Observational Data. Proceedings of the AAAI Conference on Artificial Intelligence, 36(6), 6313-6321. https://doi.org/10.1609/aaai.v36i6.20581
AAAI Technical Track on Machine Learning I