Markov Decision Processes with Time-Varying Geometric Discounting

Authors

  • Jiarui Gan University of Oxford
  • Annika Hennes Heinrich-Heine-University Düsseldorf
  • Rupak Majumdar Max Planck Institute for Software Systems
  • Debmalya Mandal Max Planck Institute for Software Systems
  • Goran Radanovic Max Planck Institute for Software Systems

DOI:

https://doi.org/10.1609/aaai.v37i10.26413

Keywords:

PRS: Planning With Markov Models (MDPs, POMDPs), GTEP: Game Theory, RU: Sequential Decision Making

Abstract

Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective – whereby each time step is treated as an independent decision maker with their own (fixed) discount factor – and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of epsilon-SPE and show that an epsilon-SPE exists under milder assumptions. An algorithm is presented to compute an epsilon-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.

Downloads

Published

2023-06-26

How to Cite

Gan, J., Hennes, A., Majumdar, R., Mandal, D., & Radanovic, G. (2023). Markov Decision Processes with Time-Varying Geometric Discounting. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10), 11980-11988. https://doi.org/10.1609/aaai.v37i10.26413

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling