Data-Driven Decision-Theoretic Planning using Recurrent Sum-Product-Max Networks

Authors

  • Hari Tatavarti Institute for AI, University of Georgia, Athens, GA 30602 USA
  • Prashant Doshi Dept. of Computer Science & Institute for AI, University of Georgia, Athens, GA 30602 USA
  • Layton Hayes Institute for AI, University of Georgia, Athens, GA 30602 USA

Keywords:

Model Representation And Learning Domain Models For Planning, Representations For Learned Models In Planning, Reinforcement Learning Using Planning (model-based, Bayesian, Deep, Etc.)

Abstract

Sum-product networks (SPN) are knowledge compilation models and are related to other graphical models for efficient probabilistic inference such as arithmetic circuits and AND/OR graphs. Recent investigations into generalizing SPNs have yielded sum-product-max networks (SPMN) which offer a data-driven alternative for decision making that has predominantly relied on handcrafted models. However, SPMNs are not suited for decision-theoretic planning which involves sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPNs in being data driven and mostly tractable, they are also well suited for planning problems. We establish soundness conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template. RSPMNs learned on a testbed of data sets, some generated using RDDLSim, yield MEUs and policies that are close to the optimal on perfectly-observed domains and easily improve on a recent batch-constrained RL method, which is important because RSPMNs offer a new model-based approach to offline RL.

Downloads

Published

2021-05-17

How to Cite

Tatavarti, H., Doshi, P., & Hayes, L. (2021). Data-Driven Decision-Theoretic Planning using Recurrent Sum-Product-Max Networks. Proceedings of the International Conference on Automated Planning and Scheduling, 31(1), 606-614. Retrieved from https://ojs.aaai.org/index.php/ICAPS/article/view/16009