TY - JOUR
AU - Ferrer-Mestres, Jonathan
AU - G. Dietterich, Thomas
AU - Buffet, Olivier
AU - Chadès, Iadine
PY - 2020/06/01
Y2 - 2021/02/25
TI - Solving K-MDPs
JF - Proceedings of the International Conference on Automated Planning and Scheduling
JA - ICAPS
VL - 30
IS - 1
SE - Main Track
DO -
UR - https://ojs.aaai.org/index.php/ICAPS/article/view/6651
SP - 110-118
AB - <p>Markov Decision Processes (MDPs) are employed to model sequential decision-making problems under uncertainty. Traditionally, algorithms to solve MDPs have focused on solving large state or action spaces. With increasing applications of MDPs to human-operated domains such as conservation of biodiversity and health, developing easy-to-interpret solutions is of paramount importance to increase uptake of MDP policies. Here, we define the problem of solving <em>K</em>-MDPs, i.e., given an original MDP and a constraint on the number of states (<em>K</em>), generate a reduced state space MDP that minimizes the difference between the original optimal MDP value function and the reduced optimal <em>K</em>-MDP value function. Building on existing non-transitive and transitive approximate state abstraction functions, we propose a family of three algorithms based on binary search with sub-optimality bounded polynomially in a precision parameter: <em>ϕ</em><sub><em>Q</em>*ε</sub><em>K</em>-MDP-ILP, <em>ϕ</em><sub><em>Q</em>*<em>d</em></sub><em>K</em>-MDP and <em>ϕ</em><sub><em>a</em>*<em>d</em></sub><em>K</em>-MDP. We compare these algorithms to a greedy algorithm (<em>ϕ</em><sub><em>Q</em>*ε</sub> Greedy <em>K</em>-MDP) and clustering approach (<em>k</em>-means++ <em>K</em>-MDP). On randomly generated MDPs and two computational sustainability MDPs, <em>ϕ</em><sub><em>a</em>*<em>d</em></sub><em>K</em>-MDP outperformed all algorithms when it could find a feasible solution. While numerous state abstraction problems have been proposed in the literature, this is the first time that the general problem of solving <em>K</em>-MDPs is suggested. We hope that our work will generate future research aiming at increasing the interpretability of MDP policies in human-operated domains.</p>
ER -