An Improved Lower Bound on the Length of Locally-Improving Policy Sequences in MDPs with Large Action Sets

Authors

  • Pratyush Agarwal Indian Institute of Technology Bombay
  • Mulinti Shaik Wajid Indian Institute of Technology Bombay
  • Shivaram Kalyanakrishnan Indian Institute of Technology Bombay

DOI:

https://doi.org/10.1609/icaps.v35i1.36095

Abstract

Popular algorithms to solve Markov Decision Problems (MDPs) include policy iteration and the Simplex method (executed on an induced linear program). Each run of these algorithms can be associated with a sequence of "locally-improving" policies for the input MDP. For integers n >= 2, k >= 2, let f(n, k) denote the longest possible sequence of locally-improving policies for any MDP with n states and k actions per state. An alternative view of f(n, k) is as a descriptive structural property of the policy space of MDPs: it is the largest possible "c-height" in an induced "LP-digraph" of any n-state, k-action MDP. How large can f(n, k) be? A trivial upper bound on f(n, k) is the total number of (Markovian, deterministic) policies, which is k^{n}. A construction from Melekopoglou and Condon (1994) shows that f(n, 2) = 2^{n}, implying that the trivial upper bound is tight for k = 2. For k >= 3, the tightest lower bound on f(n, k) in the current literature is only Omega(k^{n / 2}) (Ashutosh et al., 2020). In this paper, we propose a family of MDPs to show a lower bound of Omega( (floor(k / 2) )^{n}) on f(n, k)---giving a exponential-in-n tightening for each k >= 6. Our investigation brings out technical challenges that do not arise for k = 2. Our result still leaves open the important question of whether f(n, k) is indeed k^{n} for n >= 2, k >= 2. We furnish an affirmative answer for the special case of n = 2, k >= 2.

Downloads

Published

2025-09-16

How to Cite

Agarwal, P., Wajid, M. S., & Kalyanakrishnan, S. (2025). An Improved Lower Bound on the Length of Locally-Improving Policy Sequences in MDPs with Large Action Sets. Proceedings of the International Conference on Automated Planning and Scheduling, 35(1), 2-10. https://doi.org/10.1609/icaps.v35i1.36095