https://ojs.aaai.org/index.php/ICAPS/issue/feedProceedings of the International Conference on Automated Planning and Scheduling2024-05-30T06:11:12-07:00Publications Departmentpublications21@aaai.orgOpen Journal Systems<p>The annual ICAPS conference series was formed in 2003 through the merger of two preexisting biennial conferences, the International Conference on Artificial Intelligence Planning and Scheduling (AIPS) and the European Conference on Planning (ECP). ICAPS continues the traditional high standards of AIPS and ECP as an archival forum for new research in the field of automated planning and scheduling. The Proceedings of the International Conference on Automated Planning and Scheduling contains the annual, archival published work of the ICAPS conference.</p>https://ojs.aaai.org/index.php/ICAPS/article/view/31454Specifying Goals to Deep Neural Networks with Answer Set Programming2024-05-30T05:51:20-07:00Forest Agostinelliforesta@cse.sc.eduRojina Pantarpanta@email.sc.eduVedant Khandelwalvedant@mailbox.sc.eduRecently, methods such as DeepCubeA have used deep reinforcement learning to learn domain-specific heuristic functions in a largely domain-independent fashion. However, such methods either assume a predetermined goal or assume that goals will be given as fully-specified states. Therefore, specifying a set of goal states to these learned heuristic functions is often impractical. To address this issue, we introduce a method of training a heuristic function that estimates the distance between a given state and a set of goal states represented as a set of ground atoms in first-order logic. Furthermore, to allow for more expressive goal specification, we introduce techniques for specifying goals as answer set programs and using answer set solvers to discover sets of ground atoms that meet the specified goals. In our experiments with the Rubik's cube, sliding tile puzzles, and Sokoban, we show that we can specify and reach different goals without any need to re-train the heuristic function. Our code is publicly available at https://github.com/forestagostinelli/SpecGoal.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31455Exact Multi-objective Path Finding with Negative Weights2024-05-30T05:51:21-07:00Saman Ahmadisaman-ahmadi@live.comNathan R. Sturtevantnathanst@ualberta.caDaniel Harabordaniel.harabor@monash.eduMahdi Jalilimahdi.jalili@rmit.edu.auThe point-to-point Multi-objective Shortest Path (MOSP) problem is a classic yet challenging task that involves finding all Pareto-optimal paths between two points in a graph with multiple edge costs. Recent studies have shown that employing A* search can lead to state-of-the-art performance in solving MOSP instances with non-negative costs. This paper proposes a novel A*-based multi-objective search framework that not only handles graphs with negative costs and even negative cycles but also incorporates multiple speed-up techniques to enhance the efficiency of exhaustive search with A*. Through extensive experiments, our algorithm demonstrates remarkable success in solving difficult MOSP instances, outperforming leading solutions by several factors.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31456On the Computational Complexity of Stackelberg Planning and Meta-Operator Verification2024-05-30T05:51:22-07:00Gregor Behnkegalvusdamor@gmail.comMarcel Steinmetzmarcel.steinmetz@laas.frStackelberg planning is a recently introduced single-turn two-player adversarial planning model, where two players are acting in a joint classical planning task, the objective of the first player being hampering the second player from achieving its goal. This places the Stackelberg planning problem somewhere between classical planning and general combinatorial two-player games. But, where exactly? All investigations of Stackelberg planning so far focused on practical aspects. We close this gap by conducting the first theoretical complexity analysis of Stackelberg planning. We show that in general Stackelberg planning is actually no harder than classical planning. Under a polynomial plan-length restriction, however, Stackelberg planning is a level higher up in the polynomial complexity hierarchy, suggesting that compilations into classical planning come with a worst-case exponential plan-length increase. In attempts to identify tractable fragments, we further study its complexity under various planning task restrictions, showing that Stackelberg planning remains intractable where classical planning is not. We finally inspect the complexity of meta-operator verification, a problem that has been recently connected to Stackelberg planning.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31457Non-deterministic Planning for Hyperproperty Verification2024-05-30T05:51:23-07:00Raven Beutnerraven.beutner@cispa.deBernd Finkbeinerfinkbeiner@cispa.saarlandNon-deterministic planning aims to find a policy that achieves a given objective in an environment where actions have uncertain effects, and the agent - potentially - only observes parts of the current state. Hyperproperties are properties that relate multiple paths of a system and can, e.g., capture security and information-flow policies. Popular logics for expressing temporal hyperproperties - such as HyperLTL - extend LTL by offering selective quantification over executions of a system. In this paper, we show that planning offers a powerful intermediate language for the automated verification of hyperproperties. Concretely, we present an algorithm that, given a HyperLTL verification problem, constructs a non-deterministic multi-agent planning instance (in the form of a QDec-POMDP) that, when admitting a plan, implies the satisfaction of the verification problem. We show that for large fragments of HyperLTL, the resulting planning instance corresponds to a classical, FOND, or POND planning problem. We implement our encoding in a prototype verification tool and report on encouraging experimental results.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31458On Policy Reuse: An Expressive Language for Representing and Executing General Policies that Call Other Policies2024-05-30T05:51:24-07:00Blai Bonetbonet@cs.ucla.eduDominik Drexlerdominik.drexler@liu.seHéctor Geffnerhector.geffner@ml.rwth-aachen.deRecently, a simple but powerful language for expressing and learning general policies and problem decompositions (sketches) has been introduced in terms of rules defined over a set of Boolean and numerical features. In this work, we consider three extensions of this language aimed at making policies and sketches more flexible and reusable: internal memory states, as in finite state controllers; indexical features, whose values are a function of the state and a number of internal registers that can be loaded with objects; and modules that wrap up policies and sketches and allow them to call each other by passing parameters. In addition, unlike general policies that select state transitions rather than ground actions, the new language allows for the selection of such actions. The expressive power of the resulting language for policies and sketches is illustrated through a number of examples.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31459Abstraction Heuristics for Factored Tasks2024-05-30T05:51:25-07:00Clemens Büchnerclemens.buechner@unibas.chPatrick Ferberpatrick.ferber@unibas.chJendrik Seippjendrik.seipp@liu.seMalte Helmertmalte.helmert@unibas.chOne of the strongest approaches for optimal classical planning is A* search with heuristics based on abstractions of the planning task. Abstraction heuristics are well studied in planning formalisms without conditional effects such as SAS+. However, conditional effects are crucial to model many planning tasks compactly. In this paper, we focus on *factored* tasks which allow a specific form of conditional effect, where effects on variable x can only depend on the value of x. We generalize projections, domain abstractions, Cartesian abstractions and the counterexample-guided abstraction refinement method to this formalism. While merge-and-shrink already covers factored task in theory, we provide an implementation that does so. In our experiments, we compare these abstraction-based heuristics to other heuristics supporting conditional effects, as well as symbolic search. On our new benchmark set of factored tasks, pattern database heuristics solve the most problems, followed by symbolic approaches on par with domain abstractions. The more general Cartesian abstractions fall behind in terms of coverage but usually solve problems the fastest among all tested approaches. The generality of merge-and-shrink abstractions does not seem to be beneficial for these factored tasks.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31460Multi-Agent Temporal Task Solving and Plan Optimization2024-05-30T05:51:26-07:00J. Caballero Testónjavier.caballerot@edu.uah.esMaria D. R-Morenomalola.rmoreno@uah.esSeveral multi-agent techniques are utilized to reduce the complexity of classical planning tasks, however, their applicability to temporal planning domains is a currently open line of study in the field of Automated Planning. In this paper, we present MA-LAMA, a factored, centralized, unthreated, satisfying, multi-agent temporal planner, that exploits the 'multi-agent nature' of temporal domains to perform plan optimization. In MA-LAMA, temporal tasks are translated to the constrained snap-actions paradigm, and an automatic agent decomposition, goal assignment, and required cooperation analysis are carried out to build independent search steps, called Search Phases. These Search Phases are then solved by consecutive agent local searches, using classical heuristics and temporal constraints. Experiments show that MA-LAMA is able to solve a wide range of classical and temporal multi-agent domains, performing significantly better in plan quality than other state-of-the-art temporal planners.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31461Taming Discretised PDDL+ through Multiple Discretisations2024-05-30T05:51:27-07:00Matteo Cardellinime@matteocardellini.itMarco Marateamarco.maratea@unical.itFrancesco Percassif.percassi@hud.ac.ukEnrico Scalaenrico.scala@unibs.itMauro Vallatim.vallati@hud.ac.ukThe PDDL+ formalism allows the use of planning techniques in applications that require the ability to perform hybrid discrete-continuous reasoning. PDDL+ problems are notoriously challenging to tackle, and to reason upon them a well-established approach is discretisation. Existing systems rely on a single discretisation delta or, at most, two: a simulation delta to model the dynamics of the environment, and a planning delta, that is used to specify when decisions can be taken. However, there exist cases where this rigid schema is not ideal, for instance when agents with very different speeds need to cooperate or interact in a shared environment, and a more flexible approach that can accommodate more deltas is necessary. To address the needs of this class of hybrid planning problems, in this paper we introduce a reformulation approach that allows the encapsulation of different levels of discretisation in PDDL+ models, hence allowing any domain-independent planning engine to reap the benefits. Further, we provide the community with a new set of benchmarks that highlights the limits of fixed discretisation.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31462Return to Tradition: Learning Reliable Heuristics with Classical Machine Learning2024-05-30T05:51:29-07:00Dillon Z. Chendillon.chen@laas.frFelipe Trevizanfelipe.trevizan@gmail.comSylvie Thiébauxsylvie.thiebaux@anu.edu.auCurrent approaches for learning for planning have yet to achieve competitive performance against classical planners in several domains, and have poor overall performance. In this work, we construct novel graph representations of lifted planning tasks and use the WL algorithm to generate features from them. These features are used with classical machine learning methods which have up to 2 orders of magnitude fewer parameters and train up to 3 orders of magnitude faster than the state-of-the-art deep learning for planning models. Our novel approach, WL-GOOSE, reliably learns heuristics from scratch and outperforms the hFF heuristic in a fair competition setting. It also outperforms or ties with LAMA on 4 out of 10 domains on coverage and 7 out of 10 domains on plan quality. WL-GOOSE is the first learning for planning model which achieves these feats. Furthermore, we study the connections between our novel WL feature generation method, previous theoretically flavoured learning architectures, and Description Logic Features for planning.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31463More Flexible Proximity Wildcards Path Planning with Compressed Path Databases2024-05-30T05:51:30-07:00Xi Chen1790144051@qq.comYue Zhang1436388626@qq.comYonggang Zhangzhangyg@jlu.edu.cnGrid-based path planning is one of the classic problems in AI, and a popular topic in application areas such as computer games and robotics. Compressed Path Databases (CPDs) are recognized as a state-of-the-art method for grid-based path planning. It is able to find an optimal path extremely fast without state-space search. In recent years, researchers have tended to focus on improving CPDs by reducing CPD size or improving search performance. Among various methods, proximity wildcards are one of the most proven improvements in reducing the size of CPD. However, its proximity area is significantly restricted by complex terrain, which significantly affects the pathfinding efficiency and causes additional costs. In this paper, we enhance CPDs from the perspective of improving search efficiency and reducing search costs. Our work focuses on using more flexible methods to obtain larger proximity areas, so that more heuristic information can be used to improve search performance. Experiments conducted on the Grid-Based Path Planning Competition (GPPC) benchmarks demonstrate that the two proposed methods can effectively improve search efficiency and reduce search costs by up to 3 orders of magnitude. Remarkably, our methods can further reduce the storage cost, and improve the compression capability of CPDs simultaneously.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31464On Verifying Linear Execution Strategies in Planning Against Nature2024-05-30T05:51:33-07:00Lukáš Chrpachrpaluk@cvut.czErez Karpaskarpase@gmail.comWhile planning and acting in environments in which nature can trigger non-deterministic events, the agent has to consider that the state of the environment might change without its consent. Practically, it means that the agent has to make sure that it eventually achieves its goal (if possible) despite the acts of nature. In this paper, we first formalize the semantics of such problems in Alternating-time Temporal Logic, which allows us to prove some theoretical properties of different types of solutions. Then, we focus on linear execution strategies, which resemble classical plans in that they follow a fixed sequence of actions. We show that any problem that can be solved by a linear execution strategy can be solved by a particular form of linear execution strategy which assigns wait-for preconditions to each action in the plan that specifies when to execute that action. Then, we propose a sound algorithm that verifies a sequence of actions and assigns wait-for preconditions to them by leveraging abstraction.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31465Planning and Acting While the Clock Ticks2024-05-30T05:51:34-07:00Andrew Colesandrew.coles@kcl.ac.ukErez Karpaskarpase@gmail.comAndrey Lavrinenkoandreyl@post.bgu.ac.ilWheeler Rumlruml@cs.unh.eduSolomon Eyal Shimonyshimony@cs.bgu.ac.ilShahaf Shperbergshperbsh@bgu.ac.ilStandard temporal planning assumes that planning takes place offline, and then execution starts at time 0. Recently, situated temporal planning was introduced, where planning starts at time 0, and execution occurs after planning terminates. Situated temporal planning reflects a more realistic scenario where time passes during planning. However, in situated temporal planning a complete plan must be generated before any action is executed. In some problems with time pressure, timing is too tight to complete planning before the first action must be executed. For example, an autonomous car that has a truck backing towards it should probably move out of the way now, and plan how to get to its destination later. In this paper, we propose a new problem setting: concurrent planning and execution, in which actions can be dispatched (executed) before planning terminates. Unlike previous work on planning and execution, we must handle wall clock deadlines that affect action applicability and goal achievement (as in situated planning) while also supporting dispatching actions before a complete plan has been found. We extend previous work on metareasoning for situated temporal planning to develop an algorithm for this new setting. Our empirical evaluation shows that when there is strong time pressure, our approach outperforms situated temporal planning.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31466Planning with Object Creation2024-05-30T05:51:35-07:00Augusto B. Corrêaaugusto.blaascorrea@unibas.chGiuseppe De Giacomodegiacomo@diag.uniroma1.itMalte Helmertmalte.helmert@unibas.chSasha Rubinsasha.rubin@sydney.edu.auClassical planning problems are defined using some specification language, such as PDDL. The domain expert defines action schemas, objects, the initial state, and the goal. One key aspect of PDDL is that the set of objects cannot be modified during plan execution. While this is fine in many domains, sometimes it makes modeling more complicated. This may impact the performance of planners, and it requires the domain expert to bound the number of required objects beforehand, which can be a challenge. We introduce an extension to the classical planning formalism, where action effects can create and remove objects. This problem is semi-decidable, but it becomes decidable if we can bound the number of objects in any given state, even though the state space is still infinite. On the practical side, we extend the Powerlifted planning system to support this PDDL extension. Our results show that this extension improves the performance of Powerlifted while supporting more natural PDDL models.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31467Multi-Objective Electric Vehicle Route and Charging Planning with Contraction Hierarchies2024-05-30T05:51:37-07:00Marek Cuchýmarek.cuchy@gmail.comJiří Vokřínekjiri.vokrinek@fel.cvut.czMichal Jakobjakobmic@fel.cvut.czElectric vehicle (EV) travel planning is a complex task that involves planning the routes and the charging sessions for EVs while optimizing travel duration and cost. We show the applicability of the multi-objective EV travel planning algorithm with practically usable solution times on country-sized road graphs with a large number of charging stations and a realistic EV model. The approach is based on multi-objective A* search enhanced by Contraction hierarchies, optimal dimensionality reduction, and sub-optimal ϵ-relaxation techniques. We performed an extensive empirical evaluation on 182,000 problem instances showing the impact of various algorithm settings on real-world map of Bavaria and Germany with more than 12,000 charging stations. The results show the proposed approach is the first one capable of performing such a genuine multi-objective optimization on realistically large country-scale problem instances that can achieve practically usable planning times in order of seconds with only a minor loss of solution quality. The achieved speed-up varies from ~11× for optimal solution to more than 250× for sub-optimal solution compared to vanilla multi-objective A*.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31468Combined Task and Motion Planning via Sketch Decompositions2024-05-30T05:51:38-07:00Magí Dalmau Morenomagi.dalmau@eurecat.orgNéstor Garcíanestor.garcia@eurecat.orgVicenç Gómezvicen.gomez@upf.eduHéctor Geffnerhector.geffner@ml.rwth-aachen.deThe challenge in combined task and motion planning (TAMP) is the effective integration of a search over a combinatorial space, usually carried out by a task planner, and a search over a continuous configuration space, carried out by a motion planner. Using motion planners for testing the feasibility of task plans and filling out the details is not effective because it makes the geometrical constraints play a passive role. This work introduces a new interleaved approach for integrating the two dimensions of TAMP that makes use of sketches, a recent simple but powerful language for expressing the decomposition of problems into subproblems. A sketch has width 1 if it decomposes the problem into subproblems that can be solved greedily in linear time. In the paper, a general sketch is introduced for several classes of TAMP problems which has width 1 under suitable assumptions. While sketch decompositions have been developed for classical planning, they offer two important benefits in the context of TAMP. First, when a task plan is found to be unfeasible due to the geometric constraints, the combinatorial search resumes in a specific subproblem. Second, the sampling of object configurations is not done once, globally, at the start of the search, but locally, at the start of each subproblem. Optimizations of this basic setting are also considered and experimental results over existing and new pick-and-place benchmarks are reported.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31469Planning Domain Simulation: An Interactive System for Plan Visualisation2024-05-30T05:51:39-07:00Emanuele De Pellegrined50@hw.ac.ukRonald P. A. Petrickr.petrick@hw.ac.ukRepresenting and manipulating domain knowledge is essential for developing systems that can visualize plans. This paper presents a novel plan visualisation system called Planning Domain Simulation (PDSim) that employs knowledge representation and manipulation techniques to support the plan visualization process. PDSim can use PDDL or the Unified Planning Library's Python representation as the underlying language for modelling planning problems and provides an interface for users to manipulate this representation through interaction with the Unity game engine and a set of planners. The system’s features include visualising plan components, and their relationships, identifying plan conflicts, and examples applied to real-world problems. The benefits and limitations of PDSim are also discussed, highlighting future research directions in the area.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31470Learning Quadruped Locomotion Policies Using Logical Rules2024-05-30T05:51:40-07:00David DeFazioddefazi1@binghamton.eduYohei Hayamizuyhayami1@binghamton.eduShiqi Zhangzhangs@binghamton.eduQuadruped animals are capable of exhibiting a diverse range of locomotion gaits. While progress has been made in demonstrating such gaits on robots, current methods rely on motion priors, dynamics models, or other forms of extensive manual efforts. People can use natural language to describe dance moves. Could one use a formal language to specify quadruped gaits? To this end, we aim to enable easy gait specification and efficient policy learning. Leveraging Reward Machines (RMs) for high-level gait specification over foot contacts, our approach is called RM-based Locomotion Learning (RMLL), and supports adjusting gait frequency at execution time. Gait specification is enabled through the use of a few logical rules per gait (e.g., alternate between moving front feet and back feet) and does not require labor-intensive motion priors. Experimental results in simulation highlight the diversity of learned gaits (including two novel gaits), their energy consumption and stability across different terrains, and the superior sample-efficiency when compared to baselines. We also demonstrate these learned policies with a real quadruped robot. Video and supplementary materials: https://sites.google.com/view/rm-locomotion-learning/home2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31471Higher-Dimensional Potential Heuristics: Lower Bound Criterion and Connection to Correlation Complexity2024-05-30T05:51:42-07:00Simon Doldsimon.dold@unibas.chMalte Helmertmalte.helmert@unibas.chCorrelation complexity is a measure of a planning task indicating how hard it is. The introducing work, provides sufficient criteria to detect a correlation complexity of 2 on a planning task. It also introduced an example of a planning task with correlation complexity 3. In our work, we introduce a criterion to detect an arbitrary correlation complexity and extend the mentioned example to show with the new criterion that planning tasks with arbitrary correlation complexity exist.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31472New Fuzzing Biases for Action Policy Testing2024-05-30T05:51:43-07:00Jan Eisenhuteisenhut@cs.uni-saarland.deXandra Schulers8xaschu@stud.uni-saarland.deDaniel Fišerdanfis@danfis.czDaniel Höllerhoeller@cs.uni-saarland.deMaria Christakismaria.christakis@tuwien.ac.atJörg Hoffmannhoffmann@cs.uni-saarland.deTesting was recently proposed as a method to gain trust in learned action policies in classical planning. Test cases in this setting are states generated by a fuzzing process that performs random walks from the initial state. A fuzzing bias attempts to bias these random walks towards policy bugs, that is, states where the policy performs sub-optimally. Prior work explored a simple fuzzing bias based on policy-trace cost. Here, we investigate this topic more deeply. We introduce three new fuzzing biases based on analyses of policy-trace shape, estimating whether a trace is close to looping back on itself, whether it contains detours, and whether its goal-distance surface does not smoothly decline. Our experiments with two kinds of neural action policies show that these new biases improve bug-finding capabilities in many cases.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31473PDDL+ Models for Deployable yet Effective Traffic Signal Optimisation2024-05-30T05:51:44-07:00Anas El Kouaitielkouaitianas@gmail.comFrancesco Percassif.percassi@hud.ac.ukAlessandro Saettialessandro.saetti@unibs.itThomas Leo McCluskeylee@hud.ac.ukMauro Vallatim.vallati@hud.ac.ukThe use of planning techniques in traffic signal optimisation has proven effective in managing unexpected traffic conditions as well as typical traffic patterns. However, significant challenges concerning the deployability of generated signal strategies remain, as existing approaches tend not to consider constraints and features of the actual real-world infrastructure on which they will be implemented. To address this challenge, we introduce a range of PDDL+ models embodying technological requirements as well as insights from domain experts. The proposed models have been extensively tested on historical data using a range of well-known search strategies and heuristics, as well as alternative encodings. Results demonstrate their competitiveness with the state of the art.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31474Termination Properties of Transition Rules for Indirect Effects2024-05-30T05:51:47-07:00Mojtaba Elahimojtaba.elahi@aalto.fiSaurabh Fadnissaurabh.fadnis@aalto.fiJussi Rintanenjrintanen.jr@gmail.comIndirect effects of agent's actions have traditionally been formalized as condition-effect rules that always fire whenever applicable, after each action taken by the agent. In this work, we investigate a core problem of indirect effects, the possibility of arbitrarily or infinitely long sequences of rule firings. Specifically we investigate the termination of rule firings, as well as their confluence, that is, the uniqueness of the state that is ultimately reached. Both problems turn out to be PSPACE-complete. After this, we devise practically interesting syntactic and structural restrictions that guarantee polynomial-time termination and confluence tests. Finally, in the context of planning languages that support indirect effects, we propose new implementation technologies.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31475A Fast Algorithm for k-Memory Messaging Scheme Design in Dynamic Environments with Uncertainty2024-05-30T05:51:49-07:00Zhikang Fanfanzhikang@ruc.edu.cnWeiran Shenshenweiran@ruc.edu.cnWe study the problem of designing the optimal k-memory messaging scheme in a dynamic environment. Specifically, a sender, who can perfectly observe the state of a dynamic environment but cannot take actions, aims to persuade an uninformed, far-sighted receiver to take actions to maximize the long-term utility of the sender, by sending messages. We focus on k-memory messaging schemes, i.e., at each time step, the sender's messaging scheme depends on information from the previous k steps. After receiving a message, the self-interested receiver derives a posterior belief and takes action. The immediate reward of each player can be unaligned, thus the sender needs to ensure persuasiveness when designing the messaging scheme. We first formulate this problem as a bi-linear program. Then we show that there are infinitely many non-trivial persuasive messaging schemes for any problem instance. Moreover, we show that when the sender uses a k-memory messaging scheme, the optimal strategy for the receiver is also a k-memory strategy. We propose a fast heuristic algorithm for this problem and show that it can be extended to the setting where the sender has threat ability. We experimentally evaluate our algorithm, comparing it with the solution obtained by the Gurobi solver, in terms of performance and running time, in both settings. Extensive experimental results show that our algorithm outperforms the solution in running time, yet achieves comparable performance.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31476SLAMuZero: Plan and Learn to Map for Joint SLAM and Navigation2024-05-30T05:51:50-07:00Bowen Fangbf2504@columbia.eduXu Chenxc2412@columbia.eduZhengkun Panzp2243@columbia.eduXuan Disharon.di@columbia.eduMuZero has demonstrated remarkable performance in board and video games where Monte Carlo tree search (MCTS) method is utilized to learn and adapt to different game environments. This paper leverages the strength of MuZero to enhance agents’ planning capability for joint active simultaneous localization and mapping (SLAM) and navigation tasks, which require an agent to navigate an unknown environment while simultaneously constructing a map and localizing itself. We propose SLAMuZero, a novel approach for joint SLAM and navigation, which employs a search process that uses an explicit encoder-decoder architecture for mapping, followed by a prediction function to evaluate policy and value based on the generated map. SLAMuZero outperforms the state-of-the-art baseline and significantly reduces training time, underscoring the efficiency of our approach. Additionally, we develop a new open source library for implementing SLAMuZero, which is a flexible and modular toolkit for researchers and practitioners (https://github.com/bwfbowen/SLAMuZero).2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31477A Real-Time Rescheduling Algorithm for Multi-robot Plan Execution2024-05-30T05:51:51-07:00Ying Fengyingfeng@andrew.cmu.eduAdittyo Pauladittyop@andrew.cmu.eduZhe Chenzhe.chen@monash.eduJiaoyang Lijiaoyangli@cmu.eduOne area of research in multi-agent path finding is to determine how replanning can be efficiently achieved in the case of agents being delayed during execution. One option is to reschedule the passing order of agents, i.e., the sequence in which agents visit the same location. In response, we propose Switchable-Edge Search (SES), an A*-style algorithm designed to find optimal passing orders. We prove the optimality of SES and evaluate its efficiency via simulations. The best variant of SES takes less than 1 second for small- and medium-sized problems and runs up to 4 times faster than baselines for large-sized problems.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31478Towards Feasible Higher-Dimensional Potential Heuristics2024-05-30T05:51:53-07:00Daniel Fišerdanfis@danfis.czMarcel Steinmetzmarcel.steinmetz@laas.frPotential heuristics assign numerical values (potentials) to state features, where each feature is a conjunction of facts. It was previously shown that the informativeness of potential heuristics can be significantly improved by considering complex features, but computing potentials over all pairs of facts is already too costly in practice. In this paper, we investigate whether using just a few high-dimensional features instead of all conjunctions up to a dimension n can result in improved heuristics while keeping the computational cost at bay. We focus on (a) establishing a framework for studying this kind of potential heuristics, and (b) whether it is reasonable to expect improvement with just a few conjunctions. For (a), we propose two compilations that encode each conjunction explicitly as a new fact so that we can compute potentials over conjunctions in the original task as one-dimensional potentials in the compilation. Regarding (b), we provide evidence that informativeness of potential heuristics can be significantly increased with a small set of conjunctions, and these improvements have positive impact on the number of solved tasks.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31479Progressive State Space Disaggregation for Infinite Horizon Dynamic Programming2024-05-30T05:51:54-07:00Orso Forghieriorso.forghieri@gmail.comHind Castelhind.castel@telecom-sudparis.euEmmanuel Hyonehyon@parisnanterre.frErwan Le Pennecerwan.le-pennec@polytechnique.eduHigh dimensionality of model-based Reinforcement Learning and Markov Decision Processes can be reduced using abstractions of the state and action spaces. Although hierarchical learning and state abstraction methods have been explored over the past decades, explicit methods to build useful abstractions of models are rarely provided. In this work, we provide a new state abstraction method for solving infinite horizon problems in the discounted and total settings. Our approach is to progressively disaggregate abstract regions by iteratively slicing aggregations of states relatively to a value function. The distinguishing feature of our method, in contrast to previous approximations of the Bellman operator, is the disaggregation of regions during value function iterations (or policy evaluation steps). The objective is to find a more efficient aggregation that reduces the error on each piece of the partition. We provide a proof of convergence for this algorithm without making any assumptions about the structure of the problem. We also show that this process decreases the computational complexity of the Bellman operator iteration and provides useful abstractions. We then plug this state space disaggregation process in classical Dynamic Programming algorithm namely Approximate Value Iteration, Q-Value Iteration and Policy Iteration. Finally, we conduct a numerical comparison on randomly generated MDPs as well as classical MDPs. Those experiments show that our policy-based algorithm is faster than both traditional dynamic programming approach and recent aggregative methods that use a fixed number of adaptive partitions.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31480JaxPlan and GurobiPlan: Optimization Baselines for Replanning in Discrete and Mixed Discrete-Continuous Probabilistic Domains2024-05-30T05:51:55-07:00Michael Gimelfarbmike.gimelfarb@mail.utoronto.caAyal Taitlerataitler@gmail.comScott Sannerssanner@gmail.comReplanning methods that determinize a stochastic planning problem and replan at each action step have long been known to provide strong baseline (and even competition winning) solutions to discrete probabilistic planning problems. Recent work has explored the extension of replanning methods to the case of mixed discrete-continuous probabilistic domains by leveraging MILP compilations of the RDDL specification language. Other recent advances in probabilistic planning have explored the compilation of structured mixed discrete-continuous RDDL domains into a determinized computation graph that also lends itself to replanning via so-called planning by backpropagation methods. However, to date, there has not been any comprehensive comparison of these recent optimization-based replanning methodologies to the state-of-the-art winner of the discrete probabilistic IPC 2011 and 2014 and runner-up in 2018 (PROST) and the winner of the mixed discrete-continuous probabilistic IPC 2023 (DiSProd). In this paper, we describe JaxPlan, which makes several extensive upgrades to planning by backpropagation and its compact tensorized compilation from RDDL to a JAX computation graph that uses discrete relaxations and a sample average approximation. We also provide the first detailed overview of a compilation of the RDDL language specification to Gurobi's Mixed Integer Nonlinear Programming (MINLP) solver that we term GurobiPlan. We provide a comprehensive comparative analysis of JaxPlan and GurobiPlan with competition winning planners on 19 domains and a total of 155 instances to assess their performance across (a) different domains, (b) different instance sizes, and (c) different time budgets. We also release all code to reproduce the results along with the open-source planners we describe in this work.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31481Formal Representations of Classical Planning Domains2024-05-30T05:51:56-07:00Claudia Grundkeclaudia.grundke@unibas.chGabriele Rögergabriele.roeger@unibas.chMalte Helmertmalte.helmert@unibas.chPlanning domains are an important notion, e.g. when it comes to restricting the input for generalized planning or learning approaches. However, domains as specified in PDDL cannot fully capture the intuitive understanding of a planning domain. We close this semantic gap and propose using PDDL axioms to characterize the (typically infinite) set of legal tasks of a domain. A minor extension makes it possible to express all properties that can be determined in polynomial time. We demonstrate the suitability of the approach on established domains from the International Planning Competition.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31482Safe Explicable Planning2024-05-30T05:51:58-07:00Akkamahadevi Hanniahanni@asu.eduAndrew Boatengaoboaten@asu.eduYu Zhangyzhan442@asu.eduHuman expectations arise from their understanding of others and the world. In the context of human-AI interaction, this understanding may not align with reality, leading to the AI agent failing to meet expectations and compromising team performance. Explicable planning, introduced as a method to bridge this gap, aims to reconcile human expectations with the agent's optimal behavior, facilitating interpretable decision-making. However, an unresolved critical issue is ensuring safety in explicable planning, as it could result in explicable behaviors that are unsafe. To address this, we propose Safe Explicable Planning (SEP), which extends the prior work to support the specification of a safety bound. The goal of SEP is to find behaviors that align with human expectations while adhering to the specified safety criterion. Our approach generalizes the consideration of multiple objectives stemming from multiple models rather than a single model, yielding a Pareto set of safe explicable policies. We present both an exact method, guaranteeing finding the Pareto set, and a more efficient greedy method that finds one of the policies in the Pareto set. Additionally, we offer approximate solutions based on state aggregation to improve scalability. We provide formal proofs that validate the desired theoretical properties of these methods. Evaluation through simulations and physical robot experiments confirms the effectiveness of our approach for safe explicable planning.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31483Replanning in Advance for Instant Delay Recovery in Multi-Agent Applications: Rerouting Trains in a Railway Hub2024-05-30T05:51:59-07:00Issa K. Hanoui.k.hanou@tudelft.nlDevin Wild Thomasdwt@cs.unh.eduWheeler Rumlruml@cs.unh.eduMathijs de Weerdtm.m.deweerdt@tudelft.nlTrain routing is sensitive to delays that occur in the network. When a train is delayed, it is imperative that a new plan be found quickly, or else other trains may need to be stopped to ensure safety, potentially causing cascading delays. In this paper, we consider this class of multi-agent planning problems, which we call Multi-Agent Execution Delay Replanning. We show that these can be solved by reducing the problem to an any-start-time safe interval planning problem. When an agent has an any-start-time plan, it can react to a delay by simply looking up the precomputed plan for the delayed start time. We identify crucial real-world problem characteristics like the agent's speed, size, and safety envelope, and extend the any-start-time planning to account for them. Experimental results on real-world train networks show that any-start-time plans are compact and can be computed in reasonable time while enabling agents to instantly recover a safe plan.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31484An Analysis of the Decidability and Complexity of Numeric Additive Planning2024-05-30T05:52:02-07:00Hayyan Helalhelal@kbsg.rwth-aachen.deGerhard Lakemeyergerhard@cs.rwth-aachen.deIn this paper, we first define numeric additive planning (NAP), a planning formulation equivalent to Hoffmann's Restricted Tasks over Integers. Then, we analyze the minimal number of action repetitions required for a solution, since planning turns out to be decidable as long as such numbers can be calculated for all actions. We differentiate between two kinds of repetitions and solve for one by integer linear programming and the other by search. Additionally, we characterize the differences between propositional planning and NAP regarding these two kinds. To achieve this, we define so-called multi-valued partial order plans, a novel compact plan representation. Finally, we consider decidable fragments of NAP and their complexity.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31485Versatile Cost Partitioning with Exact Sensitivity Analysis2024-05-30T05:52:03-07:00Paul Höftpaul.hoft@liu.seDavid Speckdavid.speck@liu.seFlorian Pommereningflorian.pommerening@unibas.chJendrik Seippjendrik.seipp@liu.seSaturated post-hoc optimization is a powerful method for computing admissible heuristics for optimal classical planning. The approach solves a linear program (LP) for each state encountered during the search, which is computationally demanding. In this paper, we theoretically and empirically analyze to which extent we can reuse an LP solution of one state for another. We introduce a novel sensitivity analysis that can exactly characterize the set of states for which a unique LP solution is optimal. Furthermore, we identify two properties of the underlying LPs that affect reusability. Finally, we introduce an algorithm that optimizes LP solutions to generalize well to other states. Our new algorithms significantly reduce the number of necessary LP computations.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31486Expressiveness of Graph Neural Networks in Planning Domains2024-05-30T05:52:04-07:00Rostislav Horčíkrostislav.horcik@gmail.comGustav Šírgustav.sir@cvut.czGraph Neural Networks (GNNs) have become the standard method of choice for learning with structured data, demonstrating particular promise in classical planning. Their inherent invariance under symmetries of the input graphs endows them with superior generalization capabilities, compared to their symmetry-oblivious counterparts. However, this comes at the cost of limited expressive power. Particularly, GNNs cannot distinguish between graphs that satisfy identical sentences of C2 logic. To leverage GNNs for learning policies in PDDL domains, one needs to encode the contextual representation of the planning states as graphs. The expressiveness of this encoding, coupled with a specific GNN architecture, then hinges on the absence of indistinguishable states necessitating distinct actions. This paper provides a comprehensive theoretical and statistical exploration of such situations in PDDL domains across diverse natural encoding schemes and GNN models.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31487Converting Simple Temporal Networks with Uncertainty into Minimal Equivalent Dispatchable Form2024-05-30T05:52:05-07:00Luke Hunsbergerhunsberger@vassar.eduRoberto Posenatoroberto.posenato@univr.itA Simple Temporal Network with Uncertainty (STNU) is a structure for representing and reasoning about time constraints on actions that may have uncertain durations. An STNU is dynamically controllable (DC) if there exists a dynamic strategy for executing the network that guarantees that all of its constraints will be satisfied no matter how the uncertain durations turn out---within their specified bounds. However, such strategies typically require exponential space. Therefore, converting a DC STNU into a so-called dispatchable form for practical applications is essential. The relevant portions of a real-time execution strategy for a dispatchable STNU can be incrementally constructed during execution, requiring only O(n²) space, while also providing maximum flexibility and minimal computation during the execution of the network. Although existing algorithms can generate equivalent-dispatchable STNUs, they do not guarantee a minimal number of edges in the STNU graph. Since the number of edges directly impacts the computations during execution, this paper presents a novel algorithm for converting any dispatchable STNU into an equivalent dispatchable network having a minimal number of edges. The complexity of the algorithm is O(k n³), where k is the number of actions with uncertain durations, and n is the number of timepoints in the network. The paper also provides an empirical evaluation of the reduction of edges obtained by the impact of the new algorithm.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31488Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning2024-05-30T05:52:06-07:00Zhaoxun Judljzx@hotmail.comChao Yangyangchao@pjlab.org.cnFuchun Sunfcsun@mail.tsinghua.edu.cnHongbo Wangwanghongbo@fudan.edu.cnYu Qiaoqiaoyu@pjlab.org.cnLanguage-conditioned robot behavior plays a vital role in executing complex tasks by associating human commands or instructions with perception and actions. The ability to compose long-horizon tasks based on unconstrained language instructions necessitates the acquisition of a diverse set of general-purpose skills.However, acquiring inherent primitive skills in a coupled and long-horizon environment without external rewards or human supervision presents significant challenges. In this paper, we evaluate the relationship between skills and language instructions from a mathematical perspective, employing two forms of mutual information within the framework of language-conditioned policy learning.To maximize the mutual information between language and skills in an unsupervised manner, we propose an end-to-end imitation learning approach known as Language Conditioned Skill Discovery (LCSD). Specifically, we utilize vector quantization to learn discrete latent skills and leverage skill sequences of trajectories to reconstruct high-level semantic instructions.Through extensive experiments on language-conditioned robotic navigation and manipulation tasks, encompassing BabyAI, LORel, and Calvin, we demonstrate the superiority of our method over prior works. Our approach exhibits enhanced generalization capabilities towards unseen tasks, improved skill interpretability, and notably higher rates of task completion success.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31489Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings2024-05-30T05:52:08-07:00Rushang Kariarushang.karia@asu.eduPulkit Vermaverma.pulkit@asu.eduAlberto Speranzonalberto.speranzon@gmail.comSiddharth Srivastavasiddharths@asu.eduThis paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity. Theoretical results show that the system exhibits desirable convergence properties when stationarity holds.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31490Unifying and Certifying Top-Quality Planning2024-05-30T05:52:09-07:00Michael Katzctpelok@gmail.comJunkyu Leejunkyu.lee@ibm.comShirin Sohrabissohrab@us.ibm.comThe growing utilization of planning tools in practical scenarios has sparked an interest in generating multiple high-quality plans. Consequently, a range of computational problems under the general umbrella of top-quality planning were introduced over a short time period, each with its own definition. In this work, we show that the existing definitions can be unified into one, based on a dominance relation. The different computational problems, therefore, simply correspond to different dominance relations. Given the unified definition, we can now certify the top-quality of the solutions, leveraging existing certification of unsolvability and optimality. We show that task transformations found in the existing literature can be employed for the efficient certification of various top-quality planning problems and propose a novel transformation to efficiently certify loopless top-quality planning.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31491Explaining Plan Quality Differences2024-05-30T05:52:10-07:00Benjamin Krarupbenjamin.krarup@kcl.ac.ukAmanda Colesamanda.coles@kcl.ac.ukDerek Longderek.long@kcl.ac.ukDavid E. Smithdavid.smith@psresearch.xyzWe describe a method for explaining the differences between the quality of plans produced for similar planning problems. The method exploits a process of abstracting away details of the planning problems until the difference in solution quality they support has been minimised. We give a general definition of a valid abstraction of a planning problem. We then give the details of the implementation of a number of useful abstractions. Finally, we present a breadth-first search algorithm for finding suitable abstractions for explanations; and detail the results of an evaluation of the approach.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31492Planning with a Learned Policy Basis to Optimally Solve Complex Tasks2024-05-30T05:52:11-07:00David Kuricd.kuric@uva.nlGuillermo Infanteguillermo.infante@upf.eduVicenç Gómezvicen.gomez@upf.eduAnders Jonssonanders.jonsson@upf.eduHerke van Hoofh.c.vanhoof@uva.nlConventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a set of local policies that each solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these local policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine local policies via planning, our method asymptotically attains global optimality, even in stochastic environments.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31493Action Model Learning from Noisy Traces: a Probabilistic Approach2024-05-30T05:52:13-07:00Leonardo Lamannallamanna@fbk.euLuciano Serafiniserafini@fbk.euWe address the problem of learning planning domains from plan traces that are obtained by observing the environment states through noisy sensors. In such situations, approaches that assume correct traces are not applicable. We tackle the problem by designing a probabilistic graphical model where preconditions and effects of every planning domain operators, and traces’ observations are modeled by random variables. Probabilistic inference conditioned by the observed traces allows our approach to derive a posterior probability of an atom being a precondition and/or an effect of an operator. Planning domains are obtained either by sampling or by applying the maximum a posteriori criterion. We compare our approach with a frequentist baseline and the currently available state-of-the-art approaches. We measure the performance of each method according to two criteria: reconstruction of the original planning domain and effectiveness in solving new planning problems of the same domain. Our experimental analysis shows that our approach learns action models that are more accurate w.r.t. state-of-the-art approaches, and strongly outperforms other approaches in generating models that are effective for solving new problems.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31494Neural Combinatorial Optimization on Heterogeneous Graphs: An Application to the Picker Routing Problem in Mixed-shelves Warehouses2024-05-30T05:52:16-07:00Laurin Luttmannlaurin.luttmann@leuphana.deLin Xielin.xie@utwente.nlIn recent years, machine learning (ML) models capable of solving combinatorial optimization (CO) problems have received a surge of attention. While early approaches failed to outperform traditional CO solvers, the gap between handcrafted and learned heuristics has been steadily closing. However, most work in this area has focused on simple CO problems to benchmark new models and algorithms, leaving a gap in the development of methods specifically designed to handle more involved problems. Therefore, this work considers the problem of picker routing in the context of mixed-shelves warehouses, which involves not only a heterogeneous graph representation, but also a combinatorial action space resulting from the integrated selection and routing decisions to be made. We propose both a novel encoder to effectively learn representations of the heterogeneous graph and a hierarchical decoding scheme that exploits the combinatorial structure of the action space. The efficacy of the developed methods is demonstrated through a comprehensive comparison with established architectures as well as exact and heuristic solvers.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31495Investigating Large Neighbourhood Search for Bus Driver Scheduling2024-05-30T05:52:17-07:00Tommaso Mannelli Mazzolitommaso.mazzoli@tuwien.ac.atLucas Kletzanderlucas.kletzander@tuwien.ac.atPascal Van Hentenryckpascal.vanhentenryck@isye.gatech.eduNysret Musliunysret.musliu@tuwien.ac.atThe Bus Driver Scheduling Problem (BDSP) is a combinatorial optimisation problem with high practical relevance. The aim is to assign bus drivers to predetermined routes while minimising a specified objective function that considers operating costs as well as employee satisfaction. Since we must satisfy several rules from a collective agreement and European regulations, the BDSP is highly constrained. Hence, using exact methods to solve large real-life-based instances is computationally too expensive, while heuristic methods still have a considerable gap to the optimum. Our paper presents a Large Neighbourhood Search (LNS) approach to solve the BDSP. We propose several novel destroy operators and an approach using column generation to repair the sub-problem. We analyse the impact of the destroy and repair operators and investigate various possibilities to select them, including adaptivity. The proposed approach improves all the upper bounds for larger instances that exact methods cannot solve, as well as for some mid-sized instances, and outperforms existing heuristic approaches for this problem on all benchmark instances.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31496Weak and Strong Reversibility of Non-deterministic Actions: Universality and Uniformity2024-05-30T05:52:18-07:00Jakub Medjakub.med@cvut.czLukáš Chrpachrpaluk@cvut.czMichael Morakmichael.morak@aau.atWolfgang Faberwf@wfaber.comClassical planning looks for a sequence of actions that transform the initial state of the environment into a goal state. Studying whether the effects of an action can be undone by a sequence of other actions, that is, action reversibility, is beneficial, for example, in determining whether an action is safe to apply. This paper deals with action reversibility of non-deterministic actions, i.e., actions whose application might result in different outcomes. Inspired by the established notions of weak and strong plans in non-deterministic (or FOND) planning, we define the notions of weak and strong reversibility for non-deterministic actions. We then focus on the universality and uniformity of action reversibility, that is, whether we can always undo all possible effects of the action by the same means (i.e., policy), or whether some of the effects can never be undone. We show how these classes of problems can be solved via classical or FOND planning and evaluate our approaches on FOND benchmark domains.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31497Preference Explanation and Decision Support for Multi-Objective Real-World Test Laboratory Scheduling2024-05-30T05:52:19-07:00Florian Mischekfmischek@dbai.tuwien.ac.atNysret Musliunysret.musliu@tuwien.ac.atComplex real-world scheduling problems often include multiple conflicting objectives. Decision makers (DMs) can express their preferences over those objectives in different ways, including as sets of weights which are used in a linear combination of objective values. However, finding good sets of weights that result in solutions with desirable qualities is challenging and currently involves a lot of trial and error. We propose a general method to explain objectives' values under a given set of weights using Shapley regression values. We demonstrate this approach on the Test Laboratory Scheduling Problem (TLSP), for which we propose a multi-objective solution algorithm and show that suggestions for weight adjustments based on the introduced explanations are successful in guiding decision makers towards solutions that match their expectations. This method is included in the TLSP MO-Explorer, a new decision support system that enables the exploration and analysis of high-dimensional Pareto fronts.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31498Safe Learning of PDDL Domains with Conditional Effects2024-05-30T05:52:20-07:00Argaman Mordochmordocha@post.bgu.ac.ilEnrico Scalaenrico.scala@unibs.itRoni Sternroni.stern@gmail.comBrendan Jubabjuba@wustl.eduPowerful domain-independent planners have been developed to solve various types of planning problems. These planners often require a model of the acting agent's actions, given in some planning domain description language. Manually designing such an action model is a notoriously challenging task. An alternative is to automatically learn action models from observation. Such an action model is called safe if every plan created with it is consistent with the real, unknown action model. Algorithms for learning such safe action models exist, yet they cannot handle domains with conditional or universal effects, which are common constructs in many planning problems. We prove that learning non-trivial safe action models with conditional effects may require an exponential number of samples. Then, we identify reasonable assumptions under which such learning is tractable and propose Conditional-SAM, the first algorithm capable of doing so. We analyze Conditional-SAM theoretically and evaluate it experimentally. Our results show that the action models learned by Conditional-SAM can be used to solve perfectly most of the test set problems in most of the experimented domains.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31499SKATE : Successive Rank-based Task Assignment for Proactive Online Planning2024-05-30T05:52:21-07:00Déborah Conforto Nedelmanndeborah.conforto-nedelmann@isae-supaero.frJérôme Lacanjerome.lacan@isae-supaero.frCaroline P. C. Chanelcaroline.chanel@isae-supaero.frThe development of online applications for services such as package delivery, crowdsourcing, or taxi dispatching has caught the attention of the research community to the domain of online multi-agent multi-task allocation. In online service applications, tasks (or requests) to be performed arrive over time and need to be dynamically assigned to agents. Such planning problems are challenging because: (i) few or almost no information about future tasks is available for long-term reasoning; (ii) agent number, as well as, task number can be impressively high; and (iii) an efficient solution has to be reached in a limited amount of time. In this paper, we propose SKATE, a successive rank-based task assignment algorithm for online multi-agent planning. SKATE can be seen as a meta-heuristic approach which successively assigns a task to the best-ranked agent until all tasks have been assigned. We assessed the complexity of SKATE and showed it is cubic in the number of agents and tasks. To investigate how multi-agent multi-task assignment algorithms perform under a high number of agents and tasks, we compare three multi-task assignment methods in synthetic and real data benchmark environments: Integer Linear Programming (ILP), Genetic Algorithm (GA), and SKATE. In addition, a proactive approach is nested to all methods to determine near-future available agents (resources) using a receding-horizon. Based on the results obtained, we can argue that the classical ILP offers the better quality solutions when treating a low number of agents and tasks, i.e. low load despite the receding-horizon size, while it struggles to respect the time constraint for high load. SKATE performs better than the other methods in high load conditions, and even better when a variable receding-horizon is used.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31500Incremental Ordering for Scheduling Problems2024-05-30T05:52:23-07:00Stefan Neubertstefan.neubert@hpi.deKatrin Caselkatrin.casel@hu-berlin.deGiven an instance of a scheduling problem where we want to start executing jobs as soon as possible, it is advantageous if a scheduling algorithm emits the first parts of its solution early, in particular before the algorithm completes its work. Therefore, in this position paper, we analyze core scheduling problems in regards to their enumeration complexity, i.e. the computation time to the first emitted schedule entry (preprocessing time) and the worst case time between two consecutive parts of the solution (delay). Specifically, we look at scheduling instances that reduce to ordering problems. We apply a known incremental sorting algorithm for scheduling strategies that are at their core comparison-based sorting algorithms and translate corresponding upper and lower complexity bounds to the scheduling setting. For instances with n jobs and a precedence DAG with maximum degree Δ, we incrementally build a topological ordering with O(n) preprocessing and O(Δ) delay. We prove a matching lower bound and show with an adversary argument that the delay lower bound holds even in case the DAG has constant average degree and the ordering is emitted out-of-order in the form of insert operations. We complement our theoretical results with experiments that highlight the improved time-to-first-output and discuss research opportunities for similar incremental approaches for other scheduling problems.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31501Lookahead Pathology in Monte-Carlo Tree Search2024-05-30T05:52:24-07:00Khoi P. N. Nguyenkhoi.nguyen6@utdallas.eduRaghuram Ramanujanraramanujan@davidson.eduMonte-Carlo Tree Search (MCTS) is a search paradigm that first found prominence with its success in the domain of computer Go. Early theoretical work established the soundness and convergence bounds for Upper Confidence bounds applied to Trees (UCT), the most popular instantiation of MCTS; however, there remain notable gaps in our understanding of how UCT behaves in practice. In this work, we address one such gap by considering the question of whether UCT can exhibit lookahead pathology in adversarial settings --- a paradoxical phenomenon first observed in Minimax search where greater search effort leads to worse decision-making. We introduce a novel family of synthetic games that offer rich modeling possibilities while remaining amenable to mathematical analysis. Our theoretical and experimental results suggest that UCT is indeed susceptible to pathological behavior in a range of games drawn from this family.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31502Large Language Models as Planning Domain Generators2024-05-30T05:52:26-07:00James Oswaldjamesoswald111@gmail.comKavitha Srinivaskavitha.srinivas@ibm.comHarsha Kokelharsha.kokel@ibm.comJunkyu Leejunkyu.lee@ibm.comMichael Katzctpelok@gmail.comShirin Sohrabissohrab@us.ibm.comDeveloping domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31503On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)2024-05-30T05:52:27-07:00Vishal Pallaganivishalp@mailbox.sc.eduBharath Chandra Muppasanibharath@email.sc.eduKaushik Roykaushikr@email.sc.eduFrancesco Fabianoffabiano@nmsu.eduAndrea Loreggiaandrea.loreggia@gmail.comKeerthiram Murugesankeerthi166@gmail.comBiplav Srivastavabiplav.srivastava@gmail.comFrancesca Rossifrancesca.rossi2@ibm.comLior Horeshlhoresh@us.ibm.comAmit Shethamit@sc.eduAutomated Planning and Scheduling is among the growing areas in Artificial Intelligence (AI) where mention of LLMs has gained popularity. Based on a comprehensive review of 126 papers, this paper investigates eight categories based on the unique applications of LLMs in addressing various aspects of planning problems: language translation, plan generation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. For each category, we articulate the issues considered and existing gaps. A critical insight resulting from our review is that the true potential of LLMs unfolds when they are integrated with traditional symbolic planners, pointing towards a promising neuro-symbolic approach. This approach effectively combines the generative aspects of LLMs with the precision of classical planning methods. By synthesizing insights from existing literature, we underline the potential of this integration to address complex planning challenges. Our goal is to encourage the ICAPS community to recognize the complementary strengths of LLMs and symbolic planners, advocating for a direction in automated planning that leverages these synergistic capabilities to develop more advanced and intelligent planning systems. We aim to keep the categorization of papers updated on https://ai4society.github.io/LLM-Planning-Viz/, a collaborative resource that allows researchers to contribute and add new literature to the categorization.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31504Transition Landmarks from Abstraction Cuts2024-05-30T05:52:30-07:00Florian Pommereningflorian.pommerening@unibas.chClemens Büchnerclemens.buechner@unibas.chThomas Kellertho.keller@unibas.chWe introduce transition-counting constraints as a principled tool to formalize constraints that must hold in every solution of a transition system. We then show how to obtain transition landmark constraints from abstraction cuts. Transition landmarks dominate operator landmarks in theory but require solving a linear program that is prohibitively large in practice. We compare different constraints that project away transition-counting variables and then further relax the constraint. For one important special case, we provide a lossless projection. We finally discuss efficient data structures to derive cuts from abstractions and store them in a way that avoids repeated computation in every state. We compare the resulting heuristics both theoretically and on benchmarks from the international planning competition.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31505Computing Planning Centroids and Minimum Covering States Using Symbolic Bidirectional Search2024-05-30T05:52:31-07:00Alberto Pozancoalberto.pozancolancho@jpmorgan.comÁlvaro Torralbaalto@cs.aau.dkDaniel Borrajodaniel.borrajo@jpmchase.comIn some scenarios, planning agents might be interested in reaching states that keep certain relationships with respect to a set of goals. Recently, two of these types of states were proposed: centroids, which minimize the average distance to the goals; and minimum covering states, which minimize the maximum distance to the goals. Previous approaches compute these states by searching forward either in the original or a reformulated task. In this paper, we propose several algorithms that use symbolic bidirectional search to efficiently compute centroids and minimum covering states. Experimental results in existing and novel benchmarks show that our algorithms scale much better than previous approaches, establishing a new state-of-the-art technique for this problem.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31506SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments2024-05-30T05:52:32-07:00Abhinav Rajvanshiabhinav.rajvanshi@sri.comKaran Sikkakaran.sikka@sri.comXiao Linxiao.lin@sri.comBhoram Leebhoram.lee@sri.comHan-Pang Chiuhchiu@sarnoff.comAlvaro Velasquezalvarovelasquezucf@gmail.comSemantic reasoning and dynamic planning capabilities are crucial for an autonomous agent to perform complex navigation tasks in unknown environments. It requires a large amount of common-sense knowledge, that humans possess, to succeed in these tasks. We present SayNav, a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks in unknown large-scale environments. SayNav uses a novel grounding mechanism, that incrementally builds a 3D scene graph of the explored environment as inputs to LLMs, for generating feasible and contextually appropriate high-level plans for navigation. The LLM-generated plan is then executed by a pre-trained low-level planner, that treats each planned step as a short-distance point-goal navigation sub-task. SayNav dynamically generates step-by-step instructions during navigation and continuously refines future steps based on newly perceived information. We evaluate SayNav on multi-object navigation (MultiON) task, that requires the agent to utilize a massive amount of human knowledge to efficiently search multiple different objects in an unknown environment. We also introduce a benchmark dataset for MultiON task employing ProcTHOR framework that provides large photo-realistic indoor environments with variety of objects. SayNav achieves state-of-the-art results and even outperforms an oracle based baseline with strong ground-truth assumptions by more than 8% in terms of success rate, highlighting its ability to generate dynamic plans for successfully locating objects in large-scale new environments. The code, benchmark dataset and demonstration videos are accessible at https://www.sri.com/ics/computer-vision/saynav.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31507Online Control of Adaptive Large Neighborhood Search Using Deep Reinforcement Learning2024-05-30T05:52:35-07:00Robbert Reijnenr.v.j.reijnen@tue.nlYingqian Zhangyqzhang@tue.nlHoong Chuin Lauhclau@smu.edu.sgZaharah Bukhshz.bukhsh@tue.nlThe Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. The proposed method aims to learn, based on the state of the search, to configure ALNS for the next iteration to yield more effective solutions for the given optimization problem. We evaluate the proposed method on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition, achieving this with significantly fewer training observations. Furthermore, we demonstrate several good properties of the proposed DR-ALNS method: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31508Map Connectivity and Empirical Hardness of Grid-based Multi-Agent Pathfinding Problem2024-05-30T05:52:36-07:00Jingyao Renjingyaor@usc.eduEric Ewingeric_ewing@brown.eduT. K. Satish Kumartkskwork@gmail.comSven Koenigskoenig@usc.eduNora Ayaniannora_ayanian@brown.eduWe present an empirical study of the relationship between map connectivity and the empirical hardness of the multi-agent pathfinding (MAPF) problem. By analyzing the second smallest eigenvalue (commonly known as lambda2) of the normalized Laplacian matrix of different maps, our initial study indicates that maps with smaller lambda2 tend to create more challenging instances when agents are generated uniformly randomly. Additionally, we introduce a map generator based on Quality Diversity (QD) that is capable of producing maps with specified lambda2 ranges, offering a possible way for generating challenging MAPF instances. Despite the absence of a strict monotonic correlation with lambda2 and the empirical hardness of MAPF, this study serves as a valuable initial investigation for gaining a deeper understanding of what makes a MAPF instance hard to solve.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31509The Story So Far on Narrative Planning2024-05-30T05:52:37-07:00Rogelio E. Cardona Riverar.cardona.rivera@utah.eduArnav Jhalaahjhala@ncsu.eduJulie Porteousjulie.porteous@rmit.edu.auR. Michael Youngrmichael.young@utah.eduNarrative planning is the use of automated planning to construct, communicate, and understand stories, a form of information to which human cognition and enaction is pre-disposed. We review the narrative planning problem in a manner suitable as an introduction to the area, survey different plan-based methodologies and affordances for reasoning about narrative, and discuss open challenges relevant to the broader AI community.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31510Learning General Policies for Planning through GPT Models2024-05-30T05:52:38-07:00Nicholas Rossettinicholas.rossetti@unibs.itMassimiliano Tummolomassimiliano.tummolo@uniroma1.itAlfonso Emilio Gerevinialfonso.gerevini@unibs.itLuca Putelliluca.putelli@unibs.itIvan Serinaivan.serina@unibs.itMattia Chiarimattia.chiari@unibs.itMatteo Olivatomatteo.olivato@unibs.itTransformer-based architectures, such as T5, BERT and GPT, have demonstrated revolutionary capabilities in Natural Language Processing. Several studies showed that deep learning models using these architectures not only possess remarkable linguistic knowledge, but they also exhibit forms of factual knowledge, common sense, and even programming skills. However, the scientific community still debates about their reasoning capabilities, which have been recently tested in the context of automated AI planning; the literature presents mixed results, and the prevailing view is that current transformer-based models may not be adequate for planning. In this paper, we address this challenge differently. We introduce a GPT-based model customised for planning (PLANGPT) to learn a general policy for classical planning by training the model from scratch with a dataset of solved planning instances. Once PLANGPT has been trained for a domain, it can be used to generate a solution plan for an input problem instance in that domain. Our training procedure exploits automated planning knowledge to enhance the performance of the trained model. We build and evaluate our GPT model with several planning domains, and we compare its performance w.r.t. other recent deep learning techniques for generalised planning, demonstrating the effectiveness of the proposed approach.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31511Efficiently Computing Transitions in Cartesian Abstractions2024-05-30T05:52:40-07:00Jendrik Seippjendrik.seipp@liu.seCounterexample-guided Cartesian abstraction refinement yields strong heuristics for optimal classical planning. The approach iteratively finds a new abstract solution, checks where it fails for the original task and refines the abstraction to avoid the same failure in subsequent iterations. The main bottleneck of this refinement loop is the memory needed for storing all abstract transitions. To address this issue, we introduce an algorithm that efficiently computes abstract transitions on demand. This drastically reduces the memory consumption and allows us to solve tasks during the refinement loop and during the search that were previously out of reach.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31512Imitating Cost-Constrained Behaviors in Reinforcement Learning2024-05-30T05:52:41-07:00Qian Shaoqianshao.2020@phdcs.smu.edu.sgPradeep Varakanthampradeepv@smu.edu.sgShih-Fen Chengsfcheng@smu.edu.sgComplex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31513Accelerating Search-Based Planning for Multi-Robot Manipulation by Leveraging Online-Generated Experiences2024-05-30T05:52:42-07:00Yorai Shaoulyshaoul@andrew.cmu.eduItamar Mishaniimishani@andrew.cmu.eduMaxim Likhachevmaxim@cs.cmu.eduJiaoyang Lijiaoyangli@cmu.eduAn exciting frontier in robotic manipulation is the use of multiple arms at once. However, planning concurrent motions is a challenging task using current methods. The high-dimensional composite state space renders many well-known motion planning algorithms intractable. Recently, Multi-Agent Path Finding (MAPF) algorithms have shown promise in discrete 2D domains, providing rigorous guarantees. However, widely used conflict-based methods in MAPF assume an efficient single-agent motion planner. This poses challenges in adapting them to manipulation cases where this assumption does not hold, due to the high dimensionality of configuration spaces and the computational bottlenecks associated with collision checking. To this end, we propose an approach for accelerating conflict-based search algorithms by leveraging their repetitive and incremental nature -- making them tractable for use in complex scenarios involving multi-arm coordination in obstacle-laden environments. We show that our method preserves completeness and bounded sub-optimality guarantees, and demonstrate its practical efficacy through a set of experiments with up to 10 robotic arms.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31514Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents2024-05-30T05:52:45-07:00Yash Shuklayash.shukla@tufts.eduTanushree Burmantanushree.burman@tufts.eduAbhishek N. Kulkarniabhishek.nkulkarni21@gmail.comRobert Wrightrobert.wright@gtri.gatech.eduAlvaro Velasquezalvarovelasquezucf@gmail.comJivko Sinapovjivko.sinapov@tufts.eduReinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTLf) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31515Merging or Computing Saturated Cost Partitionings? A Merge Strategy for the Merge-and-Shrink Framework2024-05-30T05:52:46-07:00Silvan Sieverssilvan.sievers@unibas.chThomas Kellertho.keller@unibas.chGabriele Rögergabriele.roeger@unibas.chThe merge-and-shrink framework is a powerful tool for computing abstraction heuristics for optimal classical planning. Merging is one of its name-giving transformations. It entails computing the product of two factors of a factored transition system. To decide which two factors to merge, the framework uses a merge strategy. While there exist many merge strategies, it is generally unclear what constitutes a strong merge strategy, and a previous analysis shows that there is still lots of room for improvement with existing merge strategies. In this paper, we devise a new scoring function for score-based merge strategies based on answering the question whether merging two factors has any benefits over computing saturated cost partitioning heuristics over the factors instead. Our experimental evaluation shows that our new merge strategy achieves state-of-the-art performance on IPC benchmarks.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31516Decoupled Search for the Masses: A Novel Task Transformation for Classical Planning2024-05-30T05:52:47-07:00David Speckdavid.speck@liu.seDaniel Gnaddaniel.gnad@liu.seAutomated problem reformulation is a common technique in classical planning to identify and exploit problem structures. Decoupled search is an approach that automatically decomposes planning tasks based on their causal structure, often significantly reducing the search effort. However, its broad applicability is limited by the need for specialized algorithms. In this paper, we present an approach that embodies decoupled search for non-optimal planning through a novel task transformation. Specifically, given a task and a decomposition, we create a transformed task such that the state space of the transformed task is isomorphic to that of decoupled search on the original task. This eliminates the need for specialized algorithms and allows the use of various planning technology in the decoupled-search framework. Empirical evaluation shows that our method is empirically competitive with specialized decoupled algorithms and favorable to other related problem reformulation techniques.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31517Explaining the Space of SSP Policies via Policy-Property Dependencies: Complexity, Algorithms, and Relation to Multi-Objective Planning2024-05-30T05:52:48-07:00Marcel Steinmetzmarcel.steinmetz@laas.frSylvie Thiébauxsylvie.thiebaux@anu.edu.auDaniel Höllerhoeller@cs.uni-saarland.deFlorent Teichteil-Königsbuchflorent.teichteil-koenigsbuch@airbus.comStochastic shortest path (SSP) problems are a common framework for planning under uncertainty. However, the reactive structure of their solution policies is typically not easily comprehensible by an end-user, nor do planners justify the reasons behind their choice of a particular policy over others. To strengthen confidence in the planner's decision-making, recent work in classical planning has introduced a framework for explaining to the user the possible solution space in terms of necessary trade-offs between user-provided plan properties. Here, we extend this framework to SSPs. We introduce a notion of policy properties taking into account action-outcome uncertainty. We analyze formally the computational problem of identifying the exclusion relationships between policy properties, showing that this problem is in fact harder than SSP planning in a complexity theoretical sense. We show that all the relationships can be identified through a series of heuristic searches, which, if ordered in a clever way, yields an anytime algorithm. Further, we introduce an alternative method, which leverages a connection to multi-objective probabilistic planning to move all the computational burden to a preprocessing step. Finally, we explore empirically the feasibility of the proposed explanation methodology on a range of adapted IPPC benchmarks.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31518Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent2024-05-30T05:52:50-07:00Paula Stoccostoccop@stanford.eduSuhas Chundichundi72@stanford.eduArec Jamgochianarec@stanford.eduMykel J. Kochenderfermykel@stanford.eduLagrangian-guided Monte Carlo tree search with global dual ascent has been applied to solve large constrained partially observable Markov decision processes (CPOMDPs) online. In this work, we demonstrate that these global dual parameters can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making. To address this, we introduce history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent. We empirically compare the performance of our approach on a motivating toy example and two large CPOMDPs, demonstrating improved exploration, and ultimately, safer outcomes.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31519Robust Multi-Agent Pathfinding with Continuous Time2024-05-30T05:52:51-07:00Wen Jun Tanwjtan@ntu.edu.sgXueyan Tangasxytang@ntu.edu.sgWentong Caiaswtcai@ntu.edu.sgMulti-Agent Pathfinding (MAPF) is the problem of finding plans for multiple agents such that every agent moves from its start location to its goal location without collisions. If unexpected events delay some agents during plan execution, it may not be possible for the agents to continue following their plans without causing any collision. We define and solve a T-robust MAPF problem that seeks plans that can be followed even if some delays occur, under the generalized MAPFR setting with continuous time notions. The proposed approach is complete and provides provably optimal solutions. We also develop an exact method for collision detection among agents that can be delayed. We experimentally evaluate our proposed approach in terms of efficiency and plan cost.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31520Multi-Robot Connected Fermat Spiral Coverage2024-05-30T05:52:52-07:00Jingtao Tangtodd.j.tang@gmail.comHang Mahangma@sfu.caWe introduce Multi-Robot Connected Fermat Spiral (MCFS), a novel algorithmic framework for Multi-Robot Coverage Path Planning (MCPP) that adapts Coverage Fermat Spiral (CFS) from the computer graphics community to multi-robot coordination for the first time. MCFS uniquely enables the orchestration of multiple robots to generate coverage paths that contour around arbitrarily shaped obstacles, a feature notably lacking in traditional methods. Our framework not only enhances area coverage and optimizes task performance, particularly in terms of makespan, for workspaces rich in irregular obstacles but also addresses the challenges of path continuity and curvature critical for non-holonomic robots by generating smooth paths without decomposing the workspace. MCFS solves MCPP by constructing a graph of isolines and transforming MCPP into a combinatorial optimization problem, aiming to minimize the makespan while covering all vertices. Our contributions include developing a unified CFS version for scalable and adaptable MCPP, extending it to MCPP with novel optimization techniques for cost reduction and path continuity and smoothness, and demonstrating through extensive experiments that MCFS outperforms existing MCPP methods in makespan, path curvature, coverage ratio, and overlapping ratio. Our research marks a significant step in MCPP, showcasing the fusion of computer graphics and automated planning principles to advance the capabilities of multi-robot systems in complex environments. Our code is publicly available at https://github.com/reso1/MCFS.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31521Optimal Infinite Temporal Planning: Cyclic Plans for Priced Timed Automata2024-05-30T05:52:53-07:00Rasmus G. Tollundrasmusgtollund@gmail.comNicklas S. Johansennslorup@gmail.comKristian Ø. Nielsenkgl@cs.aau.dkÁlvaro Torralbaalto@cs.aau.dkKim G. Larsenkristianodum@gmail.comMany applications require infinite plans ---i.e. an infinite sequence of actions--- in order to carry out some given process indefinitely. In addition, it is desirable to guarantee optimality. In this paper, we address this problem in the setting of doubly-priced timed automata, where we show how to efficiently compute ratio-optimal cycles for optimal infinite plans. For efficient computation, we present symbolic λ-deduction (S-λD), an any-time algorithm that uses a symbolic representation (priced zones) to search the state-space with a compact representation of the time constraints. Our approach guarantees termination while arriving at an optimal solution. Our experimental evaluation shows that S-λD outperforms the alternative of searching in the concrete state space; is very robust with respect to fine-grained temporal constraints; and has a very good anytime behaviour.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31522Improving Learnt Local MAPF Policies with Heuristic Search2024-05-30T05:52:55-07:00Rishi Veerapanenirveerapa@andrew.cmu.eduQian Wangpwang649@usc.eduKevin Renkevinren@andrew.cmu.eduArthur Jakobssonajakobss@andrew.cmu.eduJiaoyang Lijiaoyangli@cmu.eduMaxim Likhachevmaxim@cs.cmu.eduMulti-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent are appealing as these could enable decentralized systems and scale well while maintaining good solution quality. Current ML approaches to MAPF have proposed methods that have started to scratch the surface of this potential. However, state-of-the-art ML approaches produce ``local" policies that only plan for a single timestep and have poor success rates and scalability. Our main idea is that we can improve a ML local policy by using heuristic search methods on the output probability distribution to resolve deadlocks and enable full horizon planning. We show several model-agnostic ways to use heuristic search with learnt policies that significantly improve the policies' success rates and scalability. To our best knowledge, we demonstrate the first time ML-based MAPF approaches have scaled to high congestion scenarios (e.g. 20% agent density).2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31523Neural Action Policy Safety Verification: Applicablity Filtering2024-05-30T05:52:56-07:00Marcel Vinzentvinzent@cs.uni-saarland.deJörg Hoffmannhoffmann@cs.uni-saarland.deNeural networks (NN) are an increasingly important representation of action policies pi. Applicability filtering is a commonly used practice in this context, restricting the action selection in pi to only applicable actions. Policy predicate abstraction (PPA) has recently been introduced to verify safety of neural pi, through over-approximating the state space subgraph induced by pi. Thus far however, PPA does not permit applicability filtering, which is challenging due to the additional constraints that need to be taken into account. Here we overcome that limitation, through a range of algorithmic enhancements. In our experiments, our enhancements achieve several orders of magnitude speed-up over a baseline implementation, bringing PPA with applicability filtering close to the performance of PPA without such filtering.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31524Efficient Approximate Search for Multi-Objective Multi-Agent Path Finding2024-05-30T05:52:58-07:00Fangji Wangwang-fj20@mails.tsinghua.edu.cnHan Zhangzhan645@usc.eduSven Koenigskoenig@usc.eduJiaoyang Lijiaoyangli@cmu.eduThe Multi-Objective Multi-Agent Path Finding (MO-MAPF) problem is the problem of computing collision-free paths for a team of agents while minimizing multiple cost metrics. Most existing MO-MAPF algorithms aim to compute the Pareto frontier. However, the Pareto frontier can be time-consuming to compute. Our first main contribution is BB-MO-CBS-pex, an approximate MO-MAPF algorithm that computes an approximate frontier for a user-specific approximation factor. BB-MO-CBS-pex builds upon BB-MO-CBS, a state-of-the-art MO-MAPF algorithm, and leverages A*pex, a state-of-the-art single-agent multi-objective search algorithm, to speed up different parts of BB-MO-CBS. We also provide two speed-up techniques for BB-MO-CBS-pex. Our second main contribution is BB-MO-CBS-k, which builds upon BB-MO-CBS-pex and computes up to k solutions for a user-provided k-value. BB-MO-CBS-k is useful when it is unclear how to determine an appropriate approximation factor. Our experimental results show that both BB-MO-CBS-pex and BB-MO-CBS-k solved significantly more instances than BB-MO-CBS for different approximation factors and k-values, respectively. Additionally, we compare BB-MO-CBS-pex with an approximate baseline algorithm derived from BB-MO-CBS and show that BB-MO-CBS-pex achieved speed-ups up to two orders of magnitude.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31525MAPF in 3D Warehouses: Dataset and Analysis2024-05-30T05:53:00-07:00Qian Wangpwang649@usc.eduRishi Veerapanenirveerapa@andrew.cmu.eduYu Wuyuwu3@andrew.cmu.eduJiaoyang Lijiaoyangli@cmu.eduMaxim Likhachevmaxim@cs.cmu.eduRecent works have made significant progress in multi-agent path finding (MAPF), with modern methods being able to scale to hundreds of agents, handle unexpected delays, work in groups, etc. The vast majority of these methods have focused on 2D "grid world" domains. However, modern warehouses often utilize multi-agent robotic systems that can move in 3D, enabling dense storage but resulting in a more complex multi-agent planning problem. Motivated by this, we introduce and experimentally analyze the application of MAPF to 3D warehouse management, and release the first (see http://mapf.info/index.php/Main/Benchmarks) open-source 3D MAPF dataset. We benchmark two state-of-the-art MAPF methods, EECBS and MAPF-LNS2, and show how different hyper-parameters affect these methods across various 3D MAPF problems. We also investigate how the warehouse structure itself affects MAPF performance. Based on our experimental analysis, we find that a fast low-level search is critical for 3D MAPF, EECBS's suboptimality significantly changes the effect of certain CBS techniques, and certain warehouse designs can noticeably influence MAPF scalability and speed. An additional important observation is that, overall, the tested 2D MAPF techniques scaled well to 3D warehouses and demonstrate how the MAPF community's progress in 2D can generalize to 3D warehouses.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31526Learning Generalised Policies for Numeric Planning2024-05-30T05:53:01-07:00Ryan Xiao Wangryan.wang@anu.edu.auSylvie Thiébauxsylvie.thiebaux@anu.edu.auWe extend Action Schema Networks (ASNets) to learn generalised policies for numeric planning, which features quantitative numeric state variables, preconditions and effects. We propose a neural network architecture that can reason about the numeric variables both directly and in context of other variables. We also develop a dynamic exploration algorithm for more efficient training, by better balancing the exploration versus learning tradeoff to account for the greater computational demand of numeric teacher planners. Experimentally, we find that the learned generalised policies are capable of outperforming traditional numeric planners on some domains, and the dynamic exploration algorithm to be on average much faster at learning effective generalised policies than the original ASNets training algorithm.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31527Tightest Admissible Shortest Path2024-05-30T05:53:03-07:00Eyal Weisseyal.weiss@biu.ac.ilAriel Felnerfelner@bgu.ac.ilGal A. Kaminkagalk@cs.biu.ac.ilThe shortest path problem in graphs is fundamental to AI. Nearly all variants of the problem and relevant algorithms that solve them ignore edge-weight computation time and its common relation to weight uncertainty. This implies that taking these factors into consideration can potentially lead to a performance boost in relevant applications. Recently, a generalized framework for weighted directed graphs was suggested, where edge-weight can be computed (estimated) multiple times, at increasing accuracy and run-time expense. We build on this framework to introduce the problem of finding the tightest admissible shortest path (TASP); a path with the tightest suboptimality bound on the optimal cost. This is a generalization of the shortest path problem to bounded uncertainty, where edge-weight uncertainty can be traded for computational cost. We present a complete algorithm for solving TASP, with guarantees on solution quality. Empirical evaluation supports the effectiveness of this approach.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31528Neuro-Symbolic Learning of Lifted Action Models from Visual Traces2024-05-30T05:53:04-07:00Kai Xioliver.xi@anu.edu.auStephen Gouldstephen.gould@anu.edu.auSylvie Thiébauxsylvie.thiebaux@anu.edu.auModel-based planners rely on action models to describe available actions in terms of their preconditions and effects. Nonetheless, manually encoding such models is challenging, especially in complex domains. Numerous methods have been proposed to learn action models from examples of plan execution traces. However, high-level information, such as state labels within traces, is often unavailable and needs to be inferred indirectly from raw observations. In this paper, we aim to learn lifted action models from visual traces --- sequences of image-action pairs depicting discrete successive trace steps. We present ROSAME, a differentiable neuRO-Symbolic Action Model lEarner that infers action models from traces consisting of probabilistic state predictions and actions. By combining ROSAME with a deep learning computer vision model, we create an end-to-end framework that jointly learns state predictions from images and infers symbolic action models. Experimental results demonstrate that our method succeeds in both tasks, using different visual state representations, with the learned action models often matching or even surpassing those created by humans.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31529Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach2024-05-30T05:53:05-07:00Zhiyuan Yaozyao9@stevens.eduIonut Florescuifloresc@stevens.eduChihoon Leeclee4@stevens.eduIn this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31530Contrastive Explanations of Centralized Multi-agent Optimization Solutions2024-05-30T05:53:06-07:00Parisa Zehtabiparisa.zehtabi@jpmorgan.comAlberto Pozancoalberto.pozancolancho@jpmorgan.comAyala Bolchayalabl@shikumil.org.ilDaniel Borrajodaniel.borrajo@jpmchase.comSarit Kraussarit@cs.biu.ac.ilIn many real-world scenarios, agents are involved in optimization problems. Since most of these scenarios are over-constrained, optimal solutions do not always satisfy all agents. Some agents might be unhappy and ask questions of the form “Why does solution S not satisfy property P ?”. We propose CMAOE, a domain-independent approach to obtain contrastive explanations by: (i) generating a new solution S′ where property P is enforced, while also minimizing the differences between S and S′; and (ii) highlighting the differences between the two solutions, with respect to the features of the objective function of the multi-agent system. Such explanations aim to help agents understanding why the initial solution is better in the context of the multi-agent system than what they expected. We have carried out a computational evaluation that shows that CMAOE can generate contrastive explanations for large multi-agent optimization problems. We have also performed an extensive user study in four different domains that shows that: (i) after being presented with these explanations, humans’ satisfaction with the original solution increases; and (ii) the constrastive explanations generated by CMAOE are preferred or equally preferred by humans over the ones generated by state of the art approaches.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31531Bounded-Suboptimal Weight-Constrained Shortest-Path Search via Efficient Representation of Paths2024-05-30T05:53:07-07:00Han Zhangzhan645@usc.eduOren Salzmansalzman.oren@gmail.comAriel Felnerfelner@bgu.ac.ilT. K. Satish Kumartkskwork@gmail.comSven Koenigskoenig@usc.eduIn the Weight-Constrained Shortest-Path (WCSP) problem, given a graph in which each edge is annotated with a cost and a weight, a start state, and a goal state, the task is to compute a minimum-cost path from the start state to the goal state with weight no larger than a given weight limit. While most existing works have focused on solving the WCSP problem optimally, many real-world situations admit a trade-off between efficiency and a suboptimality bound for the path cost. In this paper, we propose the bounded-suboptimal WCSP algorithm WC-A*pex, which is built on the state-of-the-art approximate bi-objective search algorithm A*pex. WC-A*pex uses an approximate representation of paths with similar costs and weights to compute a (1+ε)-suboptimal path, for a given ε. During its search, WC-A*pex avoids storing all paths explicitly and thereby reduces the search effort while still retaining its (1 + ε)-suboptimality bound. On benchmark road networks, our experimental results show that WC-A*pex with ε = 0.01 (i.e., with a guaranteed suboptimality of at most 1%) achieves a speed-up of up to an order of magnitude over WC-A*, a state-of-the-art WCSP algorithm, and its bounded-suboptimal variant.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31532A Counter-Example Based Approach to Probabilistic Conformant Planning2024-05-30T05:53:09-07:00Xiaodi Zhangxiaodi.zhang@anu.edu.auAlban Grastienalban.grastien@cea.frCharles Grettoncharles.gretton@gmail.comThis paper introduces a counter-example based approach for solving probabilistic conformant planning (PCP) problems. Our algorithm incrementally generates candidate plans and identifies counter-examples until it finds a plan for which the probability of success is above the specified threshold. We prove that the algorithm is sound and complete. We further propose a variation of our algorithm that uses hitting sets to accelerate the generation of candidate plans. Experimental results show that our planner is particularly suited for problems with a high probability threshold.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31533Improving the Efficiency and Efficacy of Multi-Agent Reinforcement Learning on Complex Railway Networks with a Local-Critic Approach2024-05-30T05:53:10-07:00Yuan Zhangyzhang@cs.uni-freiburg.deUmashankar Deekshithumashankar.deekshith@deutschebahn.comJianhong Wangjianhong.wang@manchester.ac.ukJoschka Boedeckerjboedeck@informatik.uni-freiburg.deThe complex railway network is a challenging real-world multi-agent system usually involving thousands of agents. Current planning methods heavily depend on expert knowledge to formulate solutions for specific cases and are therefore hardly generalized to new scenarios, on which multi-agent reinforcement learning (MARL) draws significant attention. Despite some successful applications in multi-agent decision-making tasks, MARL is hard to scale to a large number of agents. This paper rethinks the curse of agents in the centralized-training-decentralized-execution (CTDE) paradigm and proposes a local-critic approach to address the issue. By combining the local critic with the PPO algorithm, we design a deep MARL algorithm denoted as local-critic PPO (LCPPO). In experiments, we evaluate the effectiveness of LCPPO on a complex railway network benchmark, Flatland, with various numbers of agents. Noticeably, LCPPO shows prominent generalizability and robustness under the changes of environments.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31534Planning and Execution in Multi-Agent Path Finding: Models and Algorithms2024-05-30T05:53:14-07:00Yue Zhangyue.zhang@monash.eduZhe Chenzhe.chen@monash.eduDaniel Harabordaniel.harabor@monash.eduPierre Le Bodicpierre.lebodic@monash.eduPeter J. Stuckeypeter.stuckey@monash.eduIn applications of Multi-Agent Path Finding (MAPF), it is often the sum of planning and execution times that needs to be minimised (i.e., the Goal Achievement Time). Yet current methods seldom optimise for this objective. Optimal algorithms reduce execution time, but may require exponential planning time. Non-optimal algorithms reduce planning time, but at the expense of increased path length. To address these limitations we introduce PIE (Planning and Improving while Executing), a new framework for concurrent planning and execution in MAPF. We show how different instantiations of PIE affect practical performance, including initial planning time, action commitment time and concurrent vs. sequential planning and execution. We then adapt PIE to Lifelong MAPF, a popular application setting where agents are continuously assigned new goals and where additional decisions are required to ensure feasibility. We examine a variety of different approaches to overcome these challenges and we conduct comparative experiments vs. recently proposed alternatives. Results show that PIE substantially outperforms existing methods for One-shot and Lifelong MAPF.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligencehttps://ojs.aaai.org/index.php/ICAPS/article/view/31535Decentralized, Decomposition-Based Observation Scheduling for a Large-Scale Satellite Constellation2024-05-30T05:53:15-07:00Itai Zilbersteinitai.m.zilberstein@jpl.nasa.govAnanya Raoananyara@andrew.cmu.eduMatthew Salismatthew.salis@jpl.nasa.govSteve Chiensteve.a.chien@jpl.nasa.govDeploying multi-satellite constellations for Earth observation requires coordinating potentially hundreds of spacecraft. With increasing on-board capability for autonomy, we can view the constellation as a multi-agent system (MAS) and employ decentralized scheduling solutions. We formulate the problem as a distributed constraint optimization problem (DCOP) and desire scalable inter-agent communication. The problem consists of millions of variables which, coupled with the structure, make existing DCOP algorithms inadequate for this application. We develop a scheduling approach that employs a well-coordinated heuristic, referred to as the Geometric Neighborhood Decomposition (GND) heuristic, to decompose the global DCOP into sub-problems as to enable the application of DCOP algorithms. We present the Neighborhood Stochastic Search (NSS) algorithm, a decentralized algorithm to effectively solve the multi-satellite constellation observation scheduling problem using decomposition. In full, we identify the roadblocks of deploying DCOP solvers to a large-scale, real-world problem, propose a decomposition-based scheduling approach that is effective at tackling large scale DCOPs, empirically evaluate the approach against other baseline algorithms to demonstrate the effectiveness, and discuss the generality of the approach.2024-05-30T00:00:00-07:00Copyright (c) 2024 Association for the Advancement of Artificial Intelligence