An Exact Algorithm to Make a Trade-Off between Cost and Probability in SSPs
In stochastic sequential decision problems, such as Stochastic Shortest Path Problems, the GUBS (Goal with UtilityBased Semantics) criterion considers a trade-off between probability-to-goal and cost-to-goal using a goal semantics based on Expected Utility Theory (EUT); in such a semantics, goal paths have priority over non-goal paths, but it implies neither the MAXPROB criterion nor the dual optimization criterion that finds the cheapest policy among the policies that maximize goal probability. Whereas evaluation criteria based on a sound theory such as EUT are desirable, optimal policies under GUBS are non-markovian. Non-markovian solutions are undesirable because there is not always a finite representation and even if it can be represented in a finite way, the representation may be too large to be stored. Here we define a special case of GUBS criterion that allows a finite representation to the optimal policy, the eGUBS criterion, where the cost utility function is exponencial. Considering this special case, we contribute with: (i) the proof that the eGUBSoptimal policy has a finite representation; (ii) the first exact algorithm to obtain finite optimal policies for the eGUBS criterion, and (iii) four strategies to find sub-optimal policies. We conduct experiments on one synthetic problem to evaluate each strategy. Although optimal solutions have a high memory cost, sub-optimal policies can save memory space with a small decrease in performance.