An Improved Lower Bound on the Length of Locally-Improving Policy Sequences in MDPs with Large Action Sets
DOI:
https://doi.org/10.1609/icaps.v35i1.36095Abstract
Popular algorithms to solve Markov Decision Problems (MDPs) include policy iteration and the Simplex method (executed on an induced linear program). Each run of these algorithms can be associated with a sequence of "locally-improving" policies for the input MDP. For integers n >= 2, k >= 2, let f(n, k) denote the longest possible sequence of locally-improving policies for any MDP with n states and k actions per state. An alternative view of f(n, k) is as a descriptive structural property of the policy space of MDPs: it is the largest possible "c-height" in an induced "LP-digraph" of any n-state, k-action MDP. How large can f(n, k) be? A trivial upper bound on f(n, k) is the total number of (Markovian, deterministic) policies, which is k^{n}. A construction from Melekopoglou and Condon (1994) shows that f(n, 2) = 2^{n}, implying that the trivial upper bound is tight for k = 2. For k >= 3, the tightest lower bound on f(n, k) in the current literature is only Omega(k^{n / 2}) (Ashutosh et al., 2020). In this paper, we propose a family of MDPs to show a lower bound of Omega( (floor(k / 2) )^{n}) on f(n, k)---giving a exponential-in-n tightening for each k >= 6. Our investigation brings out technical challenges that do not arise for k = 2. Our result still leaves open the important question of whether f(n, k) is indeed k^{n} for n >= 2, k >= 2. We furnish an affirmative answer for the special case of n = 2, k >= 2.Downloads
Published
2025-09-16
How to Cite
Agarwal, P., Wajid, M. S., & Kalyanakrishnan, S. (2025). An Improved Lower Bound on the Length of Locally-Improving Policy Sequences in MDPs with Large Action Sets. Proceedings of the International Conference on Automated Planning and Scheduling, 35(1), 2-10. https://doi.org/10.1609/icaps.v35i1.36095
Issue
Section
Theoretical papers