Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing

Authors

  • Philip S. Thomas Carnegie Mellon University
  • Georgios Theocharous Adobe Research
  • Mohammad Ghavamzadeh Adobe Research
  • Ishan Durugkar University of Massachusetts Amherst
  • Emma Brunskill Carnegie Mellon University

DOI:

https://doi.org/10.1609/aaai.v31i2.19104

Abstract

In this paper we consider the problem of evaluating one digital marketing policy (or more generally, a policy for an MDP with unknown transition and reward functions) using data collected from the execution of a different policy. We call this problem off-policy policy evaluation. Existing methods for off-policy policy evaluation assume that the transition and reward functions of the MDP are stationary — an assumption that is typically false, particularly for digital marketing applications. This means that existing off-policy policy evaluation methods are reactive to nonstationarity, in that they slowly correct for changes after they occur. We argue that off-policy policy evaluation for nonstationary MDPs can be phrased as a time series prediction problem, which results in predictive methods that can anticipate changes before they happen. We therefore propose a synthesis of existing off-policy policy evaluation methods with existing time series prediction methods, which we show results in a drastic reduction of mean squared error when evaluating policies using real digital marketing data set.

Downloads

Published

2017-02-11

How to Cite

Thomas, P., Theocharous, G., Ghavamzadeh, M., Durugkar, I., & Brunskill, E. (2017). Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing. Proceedings of the AAAI Conference on Artificial Intelligence, 31(2), 4740-4745. https://doi.org/10.1609/aaai.v31i2.19104