Covering Number as a Complexity Measure for POMDP Planning and Learning

Authors

  • Zongzhang Zhang University of Science and Technology of China
  • Michael Littman Rutgers University
  • Xiaoping Chen University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v26i1.8360

Keywords:

Partially observable Markov decision processes, complexity measure, covering number, planning and learning

Abstract

Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.

Downloads

Published

2021-09-20

How to Cite

Zhang, Z., Littman, M., & Chen, X. (2021). Covering Number as a Complexity Measure for POMDP Planning and Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 1853-1859. https://doi.org/10.1609/aaai.v26i1.8360

Issue

Section

Reasoning about Plans, Processes and Actions