Learning Lifted Action Models from Unsupervised Visual Traces

Authors

  • Kai Xi School of Computing, The Australian National University
  • Stephen Gould School of Computing, The Australian National University
  • Sylvie Thiebaux School of Computing, The Australian National University LAAS-CNRS, Universite de Toulouse

DOI:

https://doi.org/10.1609/icaps.v36i1.42887

Abstract

Efficient construction of models capturing the preconditions and effects of actions is essential for applying AI planning in real-world domains. Extensive prior work has explored learning such models from high-level descriptions of state and/or action sequences. In this paper, we tackle a more challenging setting: learning lifted action models from sequences of state images, without action observation. We propose a deep learning framework that jointly learns state prediction, action prediction, and a lifted action model. We also introduce a mixed-integer linear program (MILP) to prevent prediction collapse and self-reinforcing errors among predictions. The MILP takes the predicted states, actions, and action model over a subset of traces and solves for logically consistent states, actions, and action model that are as close as possible to the original predictions. Pseudo-labels extracted from the MILP solution are then used to guide further training. Experiments across multiple domains show that integrating MILP-based correction helps the model escape local optima and converge toward globally consistent solutions.

Downloads

Published

2026-06-08

How to Cite

Xi, K., Gould, S., & Thiebaux, S. (2026). Learning Lifted Action Models from Unsupervised Visual Traces. Proceedings of the International Conference on Automated Planning and Scheduling, 36(1), 687–696. https://doi.org/10.1609/icaps.v36i1.42887