Policy Comparison Oracles for Action Policy Testing

Ben Sievers; Jan Eisenhut; Jörg Hoffmann

doi:10.1609/icaps.v36i1.42859

Authors

Ben Sievers Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Jan Eisenhut Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Jörg Hoffmann Saarland University, Saarland Informatics Campus, Saarbrücken, Germany German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany

DOI:

https://doi.org/10.1609/icaps.v36i1.42859

Abstract

Testing is a natural quality assurance technique for learned action policies π. In classical planning, the testing process attempts to find states, called bugs, on which the plan generated by π is sub-optimal. A major challenge in this context is the design of test oracles, sufficient criteria for identifying bugs. Here, we introduce a new type of such oracles, that we call policy comparison oracles (PCOs). These are based on comparing π with a set of policies π' produced during the training process for π. Trivially, on a given state s, if any π' is better than π, then s is a bug in π. But the potential of policy comparison reaches far beyond that. For example, π' may produce a better sub-plan on s even if it does not reach the goal at all. We introduce a combination method that allows to arbitrarily alternate between policies at testing time, thus leveraging their combined potential. We run experiments using ASNets policies. PCOs turn out to be competitive with state-of-the-art test oracles on their own, and their integration with other oracles is superior in our evaluation.

Policy Comparison Oracles for Action Policy Testing

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information