An Improved Lower Bound on the Length of Locally-Improving Policy Sequences in MDPs with Large Action Sets

Pratyush Agarwal; Mulinti Shaik Wajid; Shivaram Kalyanakrishnan

doi:10.1609/icaps.v35i1.36095

Authors

Pratyush Agarwal Indian Institute of Technology Bombay
Mulinti Shaik Wajid Indian Institute of Technology Bombay
Shivaram Kalyanakrishnan Indian Institute of Technology Bombay

DOI:

https://doi.org/10.1609/icaps.v35i1.36095

Abstract

Popular algorithms to solve Markov Decision Problems (MDPs) include policy iteration and the Simplex method (executed on an induced linear program). Each run of these algorithms can be associated with a sequence of "locally-improving" policies for the input MDP. For integers n >= 2, k >= 2, let f(n, k) denote the longest possible sequence of locally-improving policies for any MDP with n states and k actions per state. An alternative view of f(n, k) is as a descriptive structural property of the policy space of MDPs: it is the largest possible "c-height" in an induced "LP-digraph" of any n-state, k-action MDP. How large can f(n, k) be? A trivial upper bound on f(n, k) is the total number of (Markovian, deterministic) policies, which is k^{n}. A construction from Melekopoglou and Condon (1994) shows that f(n, 2) = 2^{n}, implying that the trivial upper bound is tight for k = 2. For k >= 3, the tightest lower bound on f(n, k) in the current literature is only Omega(k^{n / 2}) (Ashutosh et al., 2020). In this paper, we propose a family of MDPs to show a lower bound of Omega( (floor(k / 2) )^{n}) on f(n, k)---giving a exponential-in-n tightening for each k >= 6. Our investigation brings out technical challenges that do not arise for k = 2. Our result still leaves open the important question of whether f(n, k) is indeed k^{n} for n >= 2, k >= 2. We furnish an affirmative answer for the special case of n = 2, k >= 2.

An Improved Lower Bound on the Length of Locally-Improving Policy Sequences in MDPs with Large Action Sets

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information