Policy Comparison Oracles for Action Policy Testing

Authors

  • Ben Sievers Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
  • Jan Eisenhut Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
  • Jörg Hoffmann Saarland University, Saarland Informatics Campus, Saarbrücken, Germany German Research Center for Artificial Intelligence (DFKI), Saarbrücken, Germany

DOI:

https://doi.org/10.1609/icaps.v36i1.42859

Abstract

Testing is a natural quality assurance technique for learned action policies π. In classical planning, the testing process attempts to find states, called bugs, on which the plan generated by π is sub-optimal. A major challenge in this context is the design of test oracles, sufficient criteria for identifying bugs. Here, we introduce a new type of such oracles, that we call policy comparison oracles (PCOs). These are based on comparing π with a set of policies π' produced during the training process for π. Trivially, on a given state s, if any π' is better than π, then s is a bug in π. But the potential of policy comparison reaches far beyond that. For example, π' may produce a better sub-plan on s even if it does not reach the goal at all. We introduce a combination method that allows to arbitrarily alternate between policies at testing time, thus leveraging their combined potential. We run experiments using ASNets policies. PCOs turn out to be competitive with state-of-the-art test oracles on their own, and their integration with other oracles is superior in our evaluation.

Downloads

Published

2026-06-08

How to Cite

Sievers, B., Eisenhut, J., & Hoffmann, J. (2026). Policy Comparison Oracles for Action Policy Testing. Proceedings of the International Conference on Automated Planning and Scheduling, 36(1), 430–434. https://doi.org/10.1609/icaps.v36i1.42859