Proceedings of the International Conference on Automated Planning and Scheduling

https://ojs.aaai.org/index.php/ICAPS/issue/feed Proceedings of the International Conference on Automated Planning and Scheduling 2024-05-30T06:11:12-07:00 Publications Department publications21@aaai.org Open Journal Systems <p>The annual ICAPS conference series was formed in 2003 through the merger of two preexisting biennial conferences, the International Conference on Artificial Intelligence Planning and Scheduling (AIPS) and the European Conference on Planning (ECP). ICAPS continues the traditional high standards of AIPS and ECP as an archival forum for new research in the field of automated planning and scheduling. The Proceedings of the International Conference on Automated Planning and Scheduling contains the annual, archival published work of the ICAPS conference.</p> https://ojs.aaai.org/index.php/ICAPS/article/view/31454 Specifying Goals to Deep Neural Networks with Answer Set Programming 2024-05-30T05:51:20-07:00 Forest Agostinelli foresta@cse.sc.edu Rojina Panta rpanta@email.sc.edu Vedant Khandelwal vedant@mailbox.sc.edu

Recently, methods such as DeepCubeA have used deep reinforcement learning to learn domain-specific heuristic functions in a largely domain-independent fashion. However, such methods either assume a predetermined goal or assume that goals will be given as fully-specified states. Therefore, specifying a set of goal states to these learned heuristic functions is often impractical. To address this issue, we introduce a method of training a heuristic function that estimates the distance between a given state and a set of goal states represented as a set of ground atoms in first-order logic. Furthermore, to allow for more expressive goal specification, we introduce techniques for specifying goals as answer set programs and using answer set solvers to discover sets of ground atoms that meet the specified goals. In our experiments with the Rubik's cube, sliding tile puzzles, and Sokoban, we show that we can specify and reach different goals without any need to re-train the heuristic function. Our code is publicly available at https://github.com/forestagostinelli/SpecGoal.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31455 Exact Multi-objective Path Finding with Negative Weights 2024-05-30T05:51:21-07:00 Saman Ahmadi saman-ahmadi@live.com Nathan R. Sturtevant nathanst@ualberta.ca Daniel Harabor daniel.harabor@monash.edu Mahdi Jalili mahdi.jalili@rmit.edu.au

The point-to-point Multi-objective Shortest Path (MOSP) problem is a classic yet challenging task that involves finding all Pareto-optimal paths between two points in a graph with multiple edge costs. Recent studies have shown that employing A* search can lead to state-of-the-art performance in solving MOSP instances with non-negative costs. This paper proposes a novel A*-based multi-objective search framework that not only handles graphs with negative costs and even negative cycles but also incorporates multiple speed-up techniques to enhance the efficiency of exhaustive search with A*. Through extensive experiments, our algorithm demonstrates remarkable success in solving difficult MOSP instances, outperforming leading solutions by several factors.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31456 On the Computational Complexity of Stackelberg Planning and Meta-Operator Verification 2024-05-30T05:51:22-07:00 Gregor Behnke galvusdamor@gmail.com Marcel Steinmetz marcel.steinmetz@laas.fr

Stackelberg planning is a recently introduced single-turn two-player adversarial planning model, where two players are acting in a joint classical planning task, the objective of the first player being hampering the second player from achieving its goal. This places the Stackelberg planning problem somewhere between classical planning and general combinatorial two-player games. But, where exactly? All investigations of Stackelberg planning so far focused on practical aspects. We close this gap by conducting the first theoretical complexity analysis of Stackelberg planning. We show that in general Stackelberg planning is actually no harder than classical planning. Under a polynomial plan-length restriction, however, Stackelberg planning is a level higher up in the polynomial complexity hierarchy, suggesting that compilations into classical planning come with a worst-case exponential plan-length increase. In attempts to identify tractable fragments, we further study its complexity under various planning task restrictions, showing that Stackelberg planning remains intractable where classical planning is not. We finally inspect the complexity of meta-operator verification, a problem that has been recently connected to Stackelberg planning.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31457 Non-deterministic Planning for Hyperproperty Verification 2024-05-30T05:51:23-07:00 Raven Beutner raven.beutner@cispa.de Bernd Finkbeiner finkbeiner@cispa.saarland

Non-deterministic planning aims to find a policy that achieves a given objective in an environment where actions have uncertain effects, and the agent - potentially - only observes parts of the current state. Hyperproperties are properties that relate multiple paths of a system and can, e.g., capture security and information-flow policies. Popular logics for expressing temporal hyperproperties - such as HyperLTL - extend LTL by offering selective quantification over executions of a system. In this paper, we show that planning offers a powerful intermediate language for the automated verification of hyperproperties. Concretely, we present an algorithm that, given a HyperLTL verification problem, constructs a non-deterministic multi-agent planning instance (in the form of a QDec-POMDP) that, when admitting a plan, implies the satisfaction of the verification problem. We show that for large fragments of HyperLTL, the resulting planning instance corresponds to a classical, FOND, or POND planning problem. We implement our encoding in a prototype verification tool and report on encouraging experimental results.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31458 On Policy Reuse: An Expressive Language for Representing and Executing General Policies that Call Other Policies 2024-05-30T05:51:24-07:00 Blai Bonet bonet@cs.ucla.edu Dominik Drexler dominik.drexler@liu.se Héctor Geffner hector.geffner@ml.rwth-aachen.de

Recently, a simple but powerful language for expressing and learning general policies and problem decompositions (sketches) has been introduced in terms of rules defined over a set of Boolean and numerical features. In this work, we consider three extensions of this language aimed at making policies and sketches more flexible and reusable: internal memory states, as in finite state controllers; indexical features, whose values are a function of the state and a number of internal registers that can be loaded with objects; and modules that wrap up policies and sketches and allow them to call each other by passing parameters. In addition, unlike general policies that select state transitions rather than ground actions, the new language allows for the selection of such actions. The expressive power of the resulting language for policies and sketches is illustrated through a number of examples.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31459 Abstraction Heuristics for Factored Tasks 2024-05-30T05:51:25-07:00 Clemens Büchner clemens.buechner@unibas.ch Patrick Ferber patrick.ferber@unibas.ch Jendrik Seipp jendrik.seipp@liu.se Malte Helmert malte.helmert@unibas.ch

One of the strongest approaches for optimal classical planning is A* search with heuristics based on abstractions of the planning task. Abstraction heuristics are well studied in planning formalisms without conditional effects such as SAS+. However, conditional effects are crucial to model many planning tasks compactly. In this paper, we focus on *factored* tasks which allow a specific form of conditional effect, where effects on variable x can only depend on the value of x. We generalize projections, domain abstractions, Cartesian abstractions and the counterexample-guided abstraction refinement method to this formalism. While merge-and-shrink already covers factored task in theory, we provide an implementation that does so. In our experiments, we compare these abstraction-based heuristics to other heuristics supporting conditional effects, as well as symbolic search. On our new benchmark set of factored tasks, pattern database heuristics solve the most problems, followed by symbolic approaches on par with domain abstractions. The more general Cartesian abstractions fall behind in terms of coverage but usually solve problems the fastest among all tested approaches. The generality of merge-and-shrink abstractions does not seem to be beneficial for these factored tasks.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31460 Multi-Agent Temporal Task Solving and Plan Optimization 2024-05-30T05:51:26-07:00 J. Caballero Testón javier.caballerot@edu.uah.es Maria D. R-Moreno malola.rmoreno@uah.es

Several multi-agent techniques are utilized to reduce the complexity of classical planning tasks, however, their applicability to temporal planning domains is a currently open line of study in the field of Automated Planning. In this paper, we present MA-LAMA, a factored, centralized, unthreated, satisfying, multi-agent temporal planner, that exploits the 'multi-agent nature' of temporal domains to perform plan optimization. In MA-LAMA, temporal tasks are translated to the constrained snap-actions paradigm, and an automatic agent decomposition, goal assignment, and required cooperation analysis are carried out to build independent search steps, called Search Phases. These Search Phases are then solved by consecutive agent local searches, using classical heuristics and temporal constraints. Experiments show that MA-LAMA is able to solve a wide range of classical and temporal multi-agent domains, performing significantly better in plan quality than other state-of-the-art temporal planners.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31461 Taming Discretised PDDL+ through Multiple Discretisations 2024-05-30T05:51:27-07:00 Matteo Cardellini me@matteocardellini.it Marco Maratea marco.maratea@unical.it Francesco Percassi f.percassi@hud.ac.uk Enrico Scala enrico.scala@unibs.it Mauro Vallati m.vallati@hud.ac.uk

The PDDL+ formalism allows the use of planning techniques in applications that require the ability to perform hybrid discrete-continuous reasoning. PDDL+ problems are notoriously challenging to tackle, and to reason upon them a well-established approach is discretisation. Existing systems rely on a single discretisation delta or, at most, two: a simulation delta to model the dynamics of the environment, and a planning delta, that is used to specify when decisions can be taken. However, there exist cases where this rigid schema is not ideal, for instance when agents with very different speeds need to cooperate or interact in a shared environment, and a more flexible approach that can accommodate more deltas is necessary. To address the needs of this class of hybrid planning problems, in this paper we introduce a reformulation approach that allows the encapsulation of different levels of discretisation in PDDL+ models, hence allowing any domain-independent planning engine to reap the benefits. Further, we provide the community with a new set of benchmarks that highlights the limits of fixed discretisation.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31462 Return to Tradition: Learning Reliable Heuristics with Classical Machine Learning 2024-05-30T05:51:29-07:00 Dillon Z. Chen dillon.chen@laas.fr Felipe Trevizan felipe.trevizan@gmail.com Sylvie Thiébaux sylvie.thiebaux@anu.edu.au

Current approaches for learning for planning have yet to achieve competitive performance against classical planners in several domains, and have poor overall performance. In this work, we construct novel graph representations of lifted planning tasks and use the WL algorithm to generate features from them. These features are used with classical machine learning methods which have up to 2 orders of magnitude fewer parameters and train up to 3 orders of magnitude faster than the state-of-the-art deep learning for planning models. Our novel approach, WL-GOOSE, reliably learns heuristics from scratch and outperforms the hFF heuristic in a fair competition setting. It also outperforms or ties with LAMA on 4 out of 10 domains on coverage and 7 out of 10 domains on plan quality. WL-GOOSE is the first learning for planning model which achieves these feats. Furthermore, we study the connections between our novel WL feature generation method, previous theoretically flavoured learning architectures, and Description Logic Features for planning.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31463 More Flexible Proximity Wildcards Path Planning with Compressed Path Databases 2024-05-30T05:51:30-07:00 Xi Chen 1790144051@qq.com Yue Zhang 1436388626@qq.com Yonggang Zhang zhangyg@jlu.edu.cn

Grid-based path planning is one of the classic problems in AI, and a popular topic in application areas such as computer games and robotics. Compressed Path Databases (CPDs) are recognized as a state-of-the-art method for grid-based path planning. It is able to find an optimal path extremely fast without state-space search. In recent years, researchers have tended to focus on improving CPDs by reducing CPD size or improving search performance. Among various methods, proximity wildcards are one of the most proven improvements in reducing the size of CPD. However, its proximity area is significantly restricted by complex terrain, which significantly affects the pathfinding efficiency and causes additional costs. In this paper, we enhance CPDs from the perspective of improving search efficiency and reducing search costs. Our work focuses on using more flexible methods to obtain larger proximity areas, so that more heuristic information can be used to improve search performance. Experiments conducted on the Grid-Based Path Planning Competition (GPPC) benchmarks demonstrate that the two proposed methods can effectively improve search efficiency and reduce search costs by up to 3 orders of magnitude. Remarkably, our methods can further reduce the storage cost, and improve the compression capability of CPDs simultaneously.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31464 On Verifying Linear Execution Strategies in Planning Against Nature 2024-05-30T05:51:33-07:00 Lukáš Chrpa chrpaluk@cvut.cz Erez Karpas karpase@gmail.com

While planning and acting in environments in which nature can trigger non-deterministic events, the agent has to consider that the state of the environment might change without its consent. Practically, it means that the agent has to make sure that it eventually achieves its goal (if possible) despite the acts of nature. In this paper, we first formalize the semantics of such problems in Alternating-time Temporal Logic, which allows us to prove some theoretical properties of different types of solutions. Then, we focus on linear execution strategies, which resemble classical plans in that they follow a fixed sequence of actions. We show that any problem that can be solved by a linear execution strategy can be solved by a particular form of linear execution strategy which assigns wait-for preconditions to each action in the plan that specifies when to execute that action. Then, we propose a sound algorithm that verifies a sequence of actions and assigns wait-for preconditions to them by leveraging abstraction.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31465 Planning and Acting While the Clock Ticks 2024-05-30T05:51:34-07:00 Andrew Coles andrew.coles@kcl.ac.uk Erez Karpas karpase@gmail.com Andrey Lavrinenko andreyl@post.bgu.ac.il Wheeler Ruml ruml@cs.unh.edu Solomon Eyal Shimony shimony@cs.bgu.ac.il Shahaf Shperberg shperbsh@bgu.ac.il

Standard temporal planning assumes that planning takes place offline, and then execution starts at time 0. Recently, situated temporal planning was introduced, where planning starts at time 0, and execution occurs after planning terminates. Situated temporal planning reflects a more realistic scenario where time passes during planning. However, in situated temporal planning a complete plan must be generated before any action is executed. In some problems with time pressure, timing is too tight to complete planning before the first action must be executed. For example, an autonomous car that has a truck backing towards it should probably move out of the way now, and plan how to get to its destination later. In this paper, we propose a new problem setting: concurrent planning and execution, in which actions can be dispatched (executed) before planning terminates. Unlike previous work on planning and execution, we must handle wall clock deadlines that affect action applicability and goal achievement (as in situated planning) while also supporting dispatching actions before a complete plan has been found. We extend previous work on metareasoning for situated temporal planning to develop an algorithm for this new setting. Our empirical evaluation shows that when there is strong time pressure, our approach outperforms situated temporal planning.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31466 Planning with Object Creation 2024-05-30T05:51:35-07:00 Augusto B. Corrêa augusto.blaascorrea@unibas.ch Giuseppe De Giacomo degiacomo@diag.uniroma1.it Malte Helmert malte.helmert@unibas.ch Sasha Rubin sasha.rubin@sydney.edu.au

Classical planning problems are defined using some specification language, such as PDDL. The domain expert defines action schemas, objects, the initial state, and the goal. One key aspect of PDDL is that the set of objects cannot be modified during plan execution. While this is fine in many domains, sometimes it makes modeling more complicated. This may impact the performance of planners, and it requires the domain expert to bound the number of required objects beforehand, which can be a challenge. We introduce an extension to the classical planning formalism, where action effects can create and remove objects. This problem is semi-decidable, but it becomes decidable if we can bound the number of objects in any given state, even though the state space is still infinite. On the practical side, we extend the Powerlifted planning system to support this PDDL extension. Our results show that this extension improves the performance of Powerlifted while supporting more natural PDDL models.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31467 Multi-Objective Electric Vehicle Route and Charging Planning with Contraction Hierarchies 2024-05-30T05:51:37-07:00 Marek Cuchý marek.cuchy@gmail.com Jiří Vokřínek jiri.vokrinek@fel.cvut.cz Michal Jakob jakobmic@fel.cvut.cz

Electric vehicle (EV) travel planning is a complex task that involves planning the routes and the charging sessions for EVs while optimizing travel duration and cost. We show the applicability of the multi-objective EV travel planning algorithm with practically usable solution times on country-sized road graphs with a large number of charging stations and a realistic EV model. The approach is based on multi-objective A* search enhanced by Contraction hierarchies, optimal dimensionality reduction, and sub-optimal ϵ-relaxation techniques. We performed an extensive empirical evaluation on 182,000 problem instances showing the impact of various algorithm settings on real-world map of Bavaria and Germany with more than 12,000 charging stations. The results show the proposed approach is the first one capable of performing such a genuine multi-objective optimization on realistically large country-scale problem instances that can achieve practically usable planning times in order of seconds with only a minor loss of solution quality. The achieved speed-up varies from ~11× for optimal solution to more than 250× for sub-optimal solution compared to vanilla multi-objective A*.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31468 Combined Task and Motion Planning via Sketch Decompositions 2024-05-30T05:51:38-07:00 Magí Dalmau Moreno magi.dalmau@eurecat.org Néstor García nestor.garcia@eurecat.org Vicenç Gómez vicen.gomez@upf.edu Héctor Geffner hector.geffner@ml.rwth-aachen.de

The challenge in combined task and motion planning (TAMP) is the effective integration of a search over a combinatorial space, usually carried out by a task planner, and a search over a continuous configuration space, carried out by a motion planner. Using motion planners for testing the feasibility of task plans and filling out the details is not effective because it makes the geometrical constraints play a passive role. This work introduces a new interleaved approach for integrating the two dimensions of TAMP that makes use of sketches, a recent simple but powerful language for expressing the decomposition of problems into subproblems. A sketch has width 1 if it decomposes the problem into subproblems that can be solved greedily in linear time. In the paper, a general sketch is introduced for several classes of TAMP problems which has width 1 under suitable assumptions. While sketch decompositions have been developed for classical planning, they offer two important benefits in the context of TAMP. First, when a task plan is found to be unfeasible due to the geometric constraints, the combinatorial search resumes in a specific subproblem. Second, the sampling of object configurations is not done once, globally, at the start of the search, but locally, at the start of each subproblem. Optimizations of this basic setting are also considered and experimental results over existing and new pick-and-place benchmarks are reported.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31469 Planning Domain Simulation: An Interactive System for Plan Visualisation 2024-05-30T05:51:39-07:00 Emanuele De Pellegrin ed50@hw.ac.uk Ronald P. A. Petrick r.petrick@hw.ac.uk

Representing and manipulating domain knowledge is essential for developing systems that can visualize plans. This paper presents a novel plan visualisation system called Planning Domain Simulation (PDSim) that employs knowledge representation and manipulation techniques to support the plan visualization process. PDSim can use PDDL or the Unified Planning Library's Python representation as the underlying language for modelling planning problems and provides an interface for users to manipulate this representation through interaction with the Unity game engine and a set of planners. The system’s features include visualising plan components, and their relationships, identifying plan conflicts, and examples applied to real-world problems. The benefits and limitations of PDSim are also discussed, highlighting future research directions in the area.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31470 Learning Quadruped Locomotion Policies Using Logical Rules 2024-05-30T05:51:40-07:00 David DeFazio ddefazi1@binghamton.edu Yohei Hayamizu yhayami1@binghamton.edu Shiqi Zhang zhangs@binghamton.edu

Quadruped animals are capable of exhibiting a diverse range of locomotion gaits. While progress has been made in demonstrating such gaits on robots, current methods rely on motion priors, dynamics models, or other forms of extensive manual efforts. People can use natural language to describe dance moves. Could one use a formal language to specify quadruped gaits? To this end, we aim to enable easy gait specification and efficient policy learning. Leveraging Reward Machines (RMs) for high-level gait specification over foot contacts, our approach is called RM-based Locomotion Learning (RMLL), and supports adjusting gait frequency at execution time. Gait specification is enabled through the use of a few logical rules per gait (e.g., alternate between moving front feet and back feet) and does not require labor-intensive motion priors. Experimental results in simulation highlight the diversity of learned gaits (including two novel gaits), their energy consumption and stability across different terrains, and the superior sample-efficiency when compared to baselines. We also demonstrate these learned policies with a real quadruped robot. Video and supplementary materials: https://sites.google.com/view/rm-locomotion-learning/home

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31471 Higher-Dimensional Potential Heuristics: Lower Bound Criterion and Connection to Correlation Complexity 2024-05-30T05:51:42-07:00 Simon Dold simon.dold@unibas.ch Malte Helmert malte.helmert@unibas.ch

Correlation complexity is a measure of a planning task indicating how hard it is. The introducing work, provides sufficient criteria to detect a correlation complexity of 2 on a planning task. It also introduced an example of a planning task with correlation complexity 3. In our work, we introduce a criterion to detect an arbitrary correlation complexity and extend the mentioned example to show with the new criterion that planning tasks with arbitrary correlation complexity exist.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31472 New Fuzzing Biases for Action Policy Testing 2024-05-30T05:51:43-07:00 Jan Eisenhut eisenhut@cs.uni-saarland.de Xandra Schuler s8xaschu@stud.uni-saarland.de Daniel Fišer danfis@danfis.cz Daniel Höller hoeller@cs.uni-saarland.de Maria Christakis maria.christakis@tuwien.ac.at Jörg Hoffmann hoffmann@cs.uni-saarland.de

Testing was recently proposed as a method to gain trust in learned action policies in classical planning. Test cases in this setting are states generated by a fuzzing process that performs random walks from the initial state. A fuzzing bias attempts to bias these random walks towards policy bugs, that is, states where the policy performs sub-optimally. Prior work explored a simple fuzzing bias based on policy-trace cost. Here, we investigate this topic more deeply. We introduce three new fuzzing biases based on analyses of policy-trace shape, estimating whether a trace is close to looping back on itself, whether it contains detours, and whether its goal-distance surface does not smoothly decline. Our experiments with two kinds of neural action policies show that these new biases improve bug-finding capabilities in many cases.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31473 PDDL+ Models for Deployable yet Effective Traffic Signal Optimisation 2024-05-30T05:51:44-07:00 Anas El Kouaiti elkouaitianas@gmail.com Francesco Percassi f.percassi@hud.ac.uk Alessandro Saetti alessandro.saetti@unibs.it Thomas Leo McCluskey lee@hud.ac.uk Mauro Vallati m.vallati@hud.ac.uk

The use of planning techniques in traffic signal optimisation has proven effective in managing unexpected traffic conditions as well as typical traffic patterns. However, significant challenges concerning the deployability of generated signal strategies remain, as existing approaches tend not to consider constraints and features of the actual real-world infrastructure on which they will be implemented. To address this challenge, we introduce a range of PDDL+ models embodying technological requirements as well as insights from domain experts. The proposed models have been extensively tested on historical data using a range of well-known search strategies and heuristics, as well as alternative encodings. Results demonstrate their competitiveness with the state of the art.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31474 Termination Properties of Transition Rules for Indirect Effects 2024-05-30T05:51:47-07:00 Mojtaba Elahi mojtaba.elahi@aalto.fi Saurabh Fadnis saurabh.fadnis@aalto.fi Jussi Rintanen jrintanen.jr@gmail.com

Indirect effects of agent's actions have traditionally been formalized as condition-effect rules that always fire whenever applicable, after each action taken by the agent. In this work, we investigate a core problem of indirect effects, the possibility of arbitrarily or infinitely long sequences of rule firings. Specifically we investigate the termination of rule firings, as well as their confluence, that is, the uniqueness of the state that is ultimately reached. Both problems turn out to be PSPACE-complete. After this, we devise practically interesting syntactic and structural restrictions that guarantee polynomial-time termination and confluence tests. Finally, in the context of planning languages that support indirect effects, we propose new implementation technologies.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31475 A Fast Algorithm for k-Memory Messaging Scheme Design in Dynamic Environments with Uncertainty 2024-05-30T05:51:49-07:00 Zhikang Fan fanzhikang@ruc.edu.cn Weiran Shen shenweiran@ruc.edu.cn

We study the problem of designing the optimal k-memory messaging scheme in a dynamic environment. Specifically, a sender, who can perfectly observe the state of a dynamic environment but cannot take actions, aims to persuade an uninformed, far-sighted receiver to take actions to maximize the long-term utility of the sender, by sending messages. We focus on k-memory messaging schemes, i.e., at each time step, the sender's messaging scheme depends on information from the previous k steps. After receiving a message, the self-interested receiver derives a posterior belief and takes action. The immediate reward of each player can be unaligned, thus the sender needs to ensure persuasiveness when designing the messaging scheme. We first formulate this problem as a bi-linear program. Then we show that there are infinitely many non-trivial persuasive messaging schemes for any problem instance. Moreover, we show that when the sender uses a k-memory messaging scheme, the optimal strategy for the receiver is also a k-memory strategy. We propose a fast heuristic algorithm for this problem and show that it can be extended to the setting where the sender has threat ability. We experimentally evaluate our algorithm, comparing it with the solution obtained by the Gurobi solver, in terms of performance and running time, in both settings. Extensive experimental results show that our algorithm outperforms the solution in running time, yet achieves comparable performance.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31476 SLAMuZero: Plan and Learn to Map for Joint SLAM and Navigation 2024-05-30T05:51:50-07:00 Bowen Fang bf2504@columbia.edu Xu Chen xc2412@columbia.edu Zhengkun Pan zp2243@columbia.edu Xuan Di sharon.di@columbia.edu

MuZero has demonstrated remarkable performance in board and video games where Monte Carlo tree search (MCTS) method is utilized to learn and adapt to different game environments. This paper leverages the strength of MuZero to enhance agents’ planning capability for joint active simultaneous localization and mapping (SLAM) and navigation tasks, which require an agent to navigate an unknown environment while simultaneously constructing a map and localizing itself. We propose SLAMuZero, a novel approach for joint SLAM and navigation, which employs a search process that uses an explicit encoder-decoder architecture for mapping, followed by a prediction function to evaluate policy and value based on the generated map. SLAMuZero outperforms the state-of-the-art baseline and significantly reduces training time, underscoring the efficiency of our approach. Additionally, we develop a new open source library for implementing SLAMuZero, which is a flexible and modular toolkit for researchers and practitioners (https://github.com/bwfbowen/SLAMuZero).

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31477 A Real-Time Rescheduling Algorithm for Multi-robot Plan Execution 2024-05-30T05:51:51-07:00 Ying Feng yingfeng@andrew.cmu.edu Adittyo Paul adittyop@andrew.cmu.edu Zhe Chen zhe.chen@monash.edu Jiaoyang Li jiaoyangli@cmu.edu

One area of research in multi-agent path finding is to determine how replanning can be efficiently achieved in the case of agents being delayed during execution. One option is to reschedule the passing order of agents, i.e., the sequence in which agents visit the same location. In response, we propose Switchable-Edge Search (SES), an A*-style algorithm designed to find optimal passing orders. We prove the optimality of SES and evaluate its efficiency via simulations. The best variant of SES takes less than 1 second for small- and medium-sized problems and runs up to 4 times faster than baselines for large-sized problems.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31478 Towards Feasible Higher-Dimensional Potential Heuristics 2024-05-30T05:51:53-07:00 Daniel Fišer danfis@danfis.cz Marcel Steinmetz marcel.steinmetz@laas.fr

Potential heuristics assign numerical values (potentials) to state features, where each feature is a conjunction of facts. It was previously shown that the informativeness of potential heuristics can be significantly improved by considering complex features, but computing potentials over all pairs of facts is already too costly in practice. In this paper, we investigate whether using just a few high-dimensional features instead of all conjunctions up to a dimension n can result in improved heuristics while keeping the computational cost at bay. We focus on (a) establishing a framework for studying this kind of potential heuristics, and (b) whether it is reasonable to expect improvement with just a few conjunctions. For (a), we propose two compilations that encode each conjunction explicitly as a new fact so that we can compute potentials over conjunctions in the original task as one-dimensional potentials in the compilation. Regarding (b), we provide evidence that informativeness of potential heuristics can be significantly increased with a small set of conjunctions, and these improvements have positive impact on the number of solved tasks.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31479 Progressive State Space Disaggregation for Infinite Horizon Dynamic Programming 2024-05-30T05:51:54-07:00 Orso Forghieri orso.forghieri@gmail.com Hind Castel hind.castel@telecom-sudparis.eu Emmanuel Hyon ehyon@parisnanterre.fr Erwan Le Pennec erwan.le-pennec@polytechnique.edu

High dimensionality of model-based Reinforcement Learning and Markov Decision Processes can be reduced using abstractions of the state and action spaces. Although hierarchical learning and state abstraction methods have been explored over the past decades, explicit methods to build useful abstractions of models are rarely provided. In this work, we provide a new state abstraction method for solving infinite horizon problems in the discounted and total settings. Our approach is to progressively disaggregate abstract regions by iteratively slicing aggregations of states relatively to a value function. The distinguishing feature of our method, in contrast to previous approximations of the Bellman operator, is the disaggregation of regions during value function iterations (or policy evaluation steps). The objective is to find a more efficient aggregation that reduces the error on each piece of the partition. We provide a proof of convergence for this algorithm without making any assumptions about the structure of the problem. We also show that this process decreases the computational complexity of the Bellman operator iteration and provides useful abstractions. We then plug this state space disaggregation process in classical Dynamic Programming algorithm namely Approximate Value Iteration, Q-Value Iteration and Policy Iteration. Finally, we conduct a numerical comparison on randomly generated MDPs as well as classical MDPs. Those experiments show that our policy-based algorithm is faster than both traditional dynamic programming approach and recent aggregative methods that use a fixed number of adaptive partitions.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31480 JaxPlan and GurobiPlan: Optimization Baselines for Replanning in Discrete and Mixed Discrete-Continuous Probabilistic Domains 2024-05-30T05:51:55-07:00 Michael Gimelfarb mike.gimelfarb@mail.utoronto.ca Ayal Taitler ataitler@gmail.com Scott Sanner ssanner@gmail.com

Replanning methods that determinize a stochastic planning problem and replan at each action step have long been known to provide strong baseline (and even competition winning) solutions to discrete probabilistic planning problems. Recent work has explored the extension of replanning methods to the case of mixed discrete-continuous probabilistic domains by leveraging MILP compilations of the RDDL specification language. Other recent advances in probabilistic planning have explored the compilation of structured mixed discrete-continuous RDDL domains into a determinized computation graph that also lends itself to replanning via so-called planning by backpropagation methods. However, to date, there has not been any comprehensive comparison of these recent optimization-based replanning methodologies to the state-of-the-art winner of the discrete probabilistic IPC 2011 and 2014 and runner-up in 2018 (PROST) and the winner of the mixed discrete-continuous probabilistic IPC 2023 (DiSProd). In this paper, we describe JaxPlan, which makes several extensive upgrades to planning by backpropagation and its compact tensorized compilation from RDDL to a JAX computation graph that uses discrete relaxations and a sample average approximation. We also provide the first detailed overview of a compilation of the RDDL language specification to Gurobi's Mixed Integer Nonlinear Programming (MINLP) solver that we term GurobiPlan. We provide a comprehensive comparative analysis of JaxPlan and GurobiPlan with competition winning planners on 19 domains and a total of 155 instances to assess their performance across (a) different domains, (b) different instance sizes, and (c) different time budgets. We also release all code to reproduce the results along with the open-source planners we describe in this work.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31481 Formal Representations of Classical Planning Domains 2024-05-30T05:51:56-07:00 Claudia Grundke claudia.grundke@unibas.ch Gabriele Röger gabriele.roeger@unibas.ch Malte Helmert malte.helmert@unibas.ch

Planning domains are an important notion, e.g. when it comes to restricting the input for generalized planning or learning approaches. However, domains as specified in PDDL cannot fully capture the intuitive understanding of a planning domain. We close this semantic gap and propose using PDDL axioms to characterize the (typically infinite) set of legal tasks of a domain. A minor extension makes it possible to express all properties that can be determined in polynomial time. We demonstrate the suitability of the approach on established domains from the International Planning Competition.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31482 Safe Explicable Planning 2024-05-30T05:51:58-07:00 Akkamahadevi Hanni ahanni@asu.edu Andrew Boateng aoboaten@asu.edu Yu Zhang yzhan442@asu.edu

Human expectations arise from their understanding of others and the world. In the context of human-AI interaction, this understanding may not align with reality, leading to the AI agent failing to meet expectations and compromising team performance. Explicable planning, introduced as a method to bridge this gap, aims to reconcile human expectations with the agent's optimal behavior, facilitating interpretable decision-making. However, an unresolved critical issue is ensuring safety in explicable planning, as it could result in explicable behaviors that are unsafe. To address this, we propose Safe Explicable Planning (SEP), which extends the prior work to support the specification of a safety bound. The goal of SEP is to find behaviors that align with human expectations while adhering to the specified safety criterion. Our approach generalizes the consideration of multiple objectives stemming from multiple models rather than a single model, yielding a Pareto set of safe explicable policies. We present both an exact method, guaranteeing finding the Pareto set, and a more efficient greedy method that finds one of the policies in the Pareto set. Additionally, we offer approximate solutions based on state aggregation to improve scalability. We provide formal proofs that validate the desired theoretical properties of these methods. Evaluation through simulations and physical robot experiments confirms the effectiveness of our approach for safe explicable planning.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31483 Replanning in Advance for Instant Delay Recovery in Multi-Agent Applications: Rerouting Trains in a Railway Hub 2024-05-30T05:51:59-07:00 Issa K. Hanou i.k.hanou@tudelft.nl Devin Wild Thomas dwt@cs.unh.edu Wheeler Ruml ruml@cs.unh.edu Mathijs de Weerdt m.m.deweerdt@tudelft.nl

Train routing is sensitive to delays that occur in the network. When a train is delayed, it is imperative that a new plan be found quickly, or else other trains may need to be stopped to ensure safety, potentially causing cascading delays. In this paper, we consider this class of multi-agent planning problems, which we call Multi-Agent Execution Delay Replanning. We show that these can be solved by reducing the problem to an any-start-time safe interval planning problem. When an agent has an any-start-time plan, it can react to a delay by simply looking up the precomputed plan for the delayed start time. We identify crucial real-world problem characteristics like the agent's speed, size, and safety envelope, and extend the any-start-time planning to account for them. Experimental results on real-world train networks show that any-start-time plans are compact and can be computed in reasonable time while enabling agents to instantly recover a safe plan.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31484 An Analysis of the Decidability and Complexity of Numeric Additive Planning 2024-05-30T05:52:02-07:00 Hayyan Helal helal@kbsg.rwth-aachen.de Gerhard Lakemeyer gerhard@cs.rwth-aachen.de

In this paper, we first define numeric additive planning (NAP), a planning formulation equivalent to Hoffmann's Restricted Tasks over Integers. Then, we analyze the minimal number of action repetitions required for a solution, since planning turns out to be decidable as long as such numbers can be calculated for all actions. We differentiate between two kinds of repetitions and solve for one by integer linear programming and the other by search. Additionally, we characterize the differences between propositional planning and NAP regarding these two kinds. To achieve this, we define so-called multi-valued partial order plans, a novel compact plan representation. Finally, we consider decidable fragments of NAP and their complexity.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31485 Versatile Cost Partitioning with Exact Sensitivity Analysis 2024-05-30T05:52:03-07:00 Paul Höft paul.hoft@liu.se David Speck david.speck@liu.se Florian Pommerening florian.pommerening@unibas.ch Jendrik Seipp jendrik.seipp@liu.se

Saturated post-hoc optimization is a powerful method for computing admissible heuristics for optimal classical planning. The approach solves a linear program (LP) for each state encountered during the search, which is computationally demanding. In this paper, we theoretically and empirically analyze to which extent we can reuse an LP solution of one state for another. We introduce a novel sensitivity analysis that can exactly characterize the set of states for which a unique LP solution is optimal. Furthermore, we identify two properties of the underlying LPs that affect reusability. Finally, we introduce an algorithm that optimizes LP solutions to generalize well to other states. Our new algorithms significantly reduce the number of necessary LP computations.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31486 Expressiveness of Graph Neural Networks in Planning Domains 2024-05-30T05:52:04-07:00 Rostislav Horčík rostislav.horcik@gmail.com Gustav Šír gustav.sir@cvut.cz

Graph Neural Networks (GNNs) have become the standard method of choice for learning with structured data, demonstrating particular promise in classical planning. Their inherent invariance under symmetries of the input graphs endows them with superior generalization capabilities, compared to their symmetry-oblivious counterparts. However, this comes at the cost of limited expressive power. Particularly, GNNs cannot distinguish between graphs that satisfy identical sentences of C2 logic. To leverage GNNs for learning policies in PDDL domains, one needs to encode the contextual representation of the planning states as graphs. The expressiveness of this encoding, coupled with a specific GNN architecture, then hinges on the absence of indistinguishable states necessitating distinct actions. This paper provides a comprehensive theoretical and statistical exploration of such situations in PDDL domains across diverse natural encoding schemes and GNN models.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31487 Converting Simple Temporal Networks with Uncertainty into Minimal Equivalent Dispatchable Form 2024-05-30T05:52:05-07:00 Luke Hunsberger hunsberger@vassar.edu Roberto Posenato roberto.posenato@univr.it

A Simple Temporal Network with Uncertainty (STNU) is a structure for representing and reasoning about time constraints on actions that may have uncertain durations. An STNU is dynamically controllable (DC) if there exists a dynamic strategy for executing the network that guarantees that all of its constraints will be satisfied no matter how the uncertain durations turn out---within their specified bounds. However, such strategies typically require exponential space. Therefore, converting a DC STNU into a so-called dispatchable form for practical applications is essential. The relevant portions of a real-time execution strategy for a dispatchable STNU can be incrementally constructed during execution, requiring only O(n²) space, while also providing maximum flexibility and minimal computation during the execution of the network. Although existing algorithms can generate equivalent-dispatchable STNUs, they do not guarantee a minimal number of edges in the STNU graph. Since the number of edges directly impacts the computations during execution, this paper presents a novel algorithm for converting any dispatchable STNU into an equivalent dispatchable network having a minimal number of edges. The complexity of the algorithm is O(k n³), where k is the number of actions with uncertain durations, and n is the number of timepoints in the network. The paper also provides an empirical evaluation of the reduction of edges obtained by the impact of the new algorithm.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31488 Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning 2024-05-30T05:52:06-07:00 Zhaoxun Ju dljzx@hotmail.com Chao Yang yangchao@pjlab.org.cn Fuchun Sun fcsun@mail.tsinghua.edu.cn Hongbo Wang wanghongbo@fudan.edu.cn Yu Qiao qiaoyu@pjlab.org.cn

Language-conditioned robot behavior plays a vital role in executing complex tasks by associating human commands or instructions with perception and actions. The ability to compose long-horizon tasks based on unconstrained language instructions necessitates the acquisition of a diverse set of general-purpose skills.However, acquiring inherent primitive skills in a coupled and long-horizon environment without external rewards or human supervision presents significant challenges. In this paper, we evaluate the relationship between skills and language instructions from a mathematical perspective, employing two forms of mutual information within the framework of language-conditioned policy learning.To maximize the mutual information between language and skills in an unsupervised manner, we propose an end-to-end imitation learning approach known as Language Conditioned Skill Discovery (LCSD). Specifically, we utilize vector quantization to learn discrete latent skills and leverage skill sequences of trajectories to reconstruct high-level semantic instructions.Through extensive experiments on language-conditioned robotic navigation and manipulation tasks, encompassing BabyAI, LORel, and Calvin, we demonstrate the superiority of our method over prior works. Our approach exhibits enhanced generalization capabilities towards unseen tasks, improved skill interpretability, and notably higher rates of task completion success.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31489 Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings 2024-05-30T05:52:08-07:00 Rushang Karia rushang.karia@asu.edu Pulkit Verma verma.pulkit@asu.edu Alberto Speranzon alberto.speranzon@gmail.com Siddharth Srivastava siddharths@asu.edu

This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity. Theoretical results show that the system exhibits desirable convergence properties when stationarity holds.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31490 Unifying and Certifying Top-Quality Planning 2024-05-30T05:52:09-07:00 Michael Katz ctpelok@gmail.com Junkyu Lee junkyu.lee@ibm.com Shirin Sohrabi ssohrab@us.ibm.com

The growing utilization of planning tools in practical scenarios has sparked an interest in generating multiple high-quality plans. Consequently, a range of computational problems under the general umbrella of top-quality planning were introduced over a short time period, each with its own definition. In this work, we show that the existing definitions can be unified into one, based on a dominance relation. The different computational problems, therefore, simply correspond to different dominance relations. Given the unified definition, we can now certify the top-quality of the solutions, leveraging existing certification of unsolvability and optimality. We show that task transformations found in the existing literature can be employed for the efficient certification of various top-quality planning problems and propose a novel transformation to efficiently certify loopless top-quality planning.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31491 Explaining Plan Quality Differences 2024-05-30T05:52:10-07:00 Benjamin Krarup benjamin.krarup@kcl.ac.uk Amanda Coles amanda.coles@kcl.ac.uk Derek Long derek.long@kcl.ac.uk David E. Smith david.smith@psresearch.xyz

We describe a method for explaining the differences between the quality of plans produced for similar planning problems. The method exploits a process of abstracting away details of the planning problems until the difference in solution quality they support has been minimised. We give a general definition of a valid abstraction of a planning problem. We then give the details of the implementation of a number of useful abstractions. Finally, we present a breadth-first search algorithm for finding suitable abstractions for explanations; and detail the results of an evaluation of the approach.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31492 Planning with a Learned Policy Basis to Optimally Solve Complex Tasks 2024-05-30T05:52:11-07:00 David Kuric d.kuric@uva.nl Guillermo Infante guillermo.infante@upf.edu Vicenç Gómez vicen.gomez@upf.edu Anders Jonsson anders.jonsson@upf.edu Herke van Hoof h.c.vanhoof@uva.nl

Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a set of local policies that each solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these local policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine local policies via planning, our method asymptotically attains global optimality, even in stochastic environments.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31493 Action Model Learning from Noisy Traces: a Probabilistic Approach 2024-05-30T05:52:13-07:00 Leonardo Lamanna llamanna@fbk.eu Luciano Serafini serafini@fbk.eu

We address the problem of learning planning domains from plan traces that are obtained by observing the environment states through noisy sensors. In such situations, approaches that assume correct traces are not applicable. We tackle the problem by designing a probabilistic graphical model where preconditions and effects of every planning domain operators, and traces’ observations are modeled by random variables. Probabilistic inference conditioned by the observed traces allows our approach to derive a posterior probability of an atom being a precondition and/or an effect of an operator. Planning domains are obtained either by sampling or by applying the maximum a posteriori criterion. We compare our approach with a frequentist baseline and the currently available state-of-the-art approaches. We measure the performance of each method according to two criteria: reconstruction of the original planning domain and effectiveness in solving new planning problems of the same domain. Our experimental analysis shows that our approach learns action models that are more accurate w.r.t. state-of-the-art approaches, and strongly outperforms other approaches in generating models that are effective for solving new problems.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31494 Neural Combinatorial Optimization on Heterogeneous Graphs: An Application to the Picker Routing Problem in Mixed-shelves Warehouses 2024-05-30T05:52:16-07:00 Laurin Luttmann laurin.luttmann@leuphana.de Lin Xie lin.xie@utwente.nl

In recent years, machine learning (ML) models capable of solving combinatorial optimization (CO) problems have received a surge of attention. While early approaches failed to outperform traditional CO solvers, the gap between handcrafted and learned heuristics has been steadily closing. However, most work in this area has focused on simple CO problems to benchmark new models and algorithms, leaving a gap in the development of methods specifically designed to handle more involved problems. Therefore, this work considers the problem of picker routing in the context of mixed-shelves warehouses, which involves not only a heterogeneous graph representation, but also a combinatorial action space resulting from the integrated selection and routing decisions to be made. We propose both a novel encoder to effectively learn representations of the heterogeneous graph and a hierarchical decoding scheme that exploits the combinatorial structure of the action space. The efficacy of the developed methods is demonstrated through a comprehensive comparison with established architectures as well as exact and heuristic solvers.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31495 Investigating Large Neighbourhood Search for Bus Driver Scheduling 2024-05-30T05:52:17-07:00 Tommaso Mannelli Mazzoli tommaso.mazzoli@tuwien.ac.at Lucas Kletzander lucas.kletzander@tuwien.ac.at Pascal Van Hentenryck pascal.vanhentenryck@isye.gatech.edu Nysret Musliu nysret.musliu@tuwien.ac.at

The Bus Driver Scheduling Problem (BDSP) is a combinatorial optimisation problem with high practical relevance. The aim is to assign bus drivers to predetermined routes while minimising a specified objective function that considers operating costs as well as employee satisfaction. Since we must satisfy several rules from a collective agreement and European regulations, the BDSP is highly constrained. Hence, using exact methods to solve large real-life-based instances is computationally too expensive, while heuristic methods still have a considerable gap to the optimum. Our paper presents a Large Neighbourhood Search (LNS) approach to solve the BDSP. We propose several novel destroy operators and an approach using column generation to repair the sub-problem. We analyse the impact of the destroy and repair operators and investigate various possibilities to select them, including adaptivity. The proposed approach improves all the upper bounds for larger instances that exact methods cannot solve, as well as for some mid-sized instances, and outperforms existing heuristic approaches for this problem on all benchmark instances.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31496 Weak and Strong Reversibility of Non-deterministic Actions: Universality and Uniformity 2024-05-30T05:52:18-07:00 Jakub Med jakub.med@cvut.cz Lukáš Chrpa chrpaluk@cvut.cz Michael Morak michael.morak@aau.at Wolfgang Faber wf@wfaber.com

Classical planning looks for a sequence of actions that transform the initial state of the environment into a goal state. Studying whether the effects of an action can be undone by a sequence of other actions, that is, action reversibility, is beneficial, for example, in determining whether an action is safe to apply. This paper deals with action reversibility of non-deterministic actions, i.e., actions whose application might result in different outcomes. Inspired by the established notions of weak and strong plans in non-deterministic (or FOND) planning, we define the notions of weak and strong reversibility for non-deterministic actions. We then focus on the universality and uniformity of action reversibility, that is, whether we can always undo all possible effects of the action by the same means (i.e., policy), or whether some of the effects can never be undone. We show how these classes of problems can be solved via classical or FOND planning and evaluate our approaches on FOND benchmark domains.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31497 Preference Explanation and Decision Support for Multi-Objective Real-World Test Laboratory Scheduling 2024-05-30T05:52:19-07:00 Florian Mischek fmischek@dbai.tuwien.ac.at Nysret Musliu nysret.musliu@tuwien.ac.at

Complex real-world scheduling problems often include multiple conflicting objectives. Decision makers (DMs) can express their preferences over those objectives in different ways, including as sets of weights which are used in a linear combination of objective values. However, finding good sets of weights that result in solutions with desirable qualities is challenging and currently involves a lot of trial and error. We propose a general method to explain objectives' values under a given set of weights using Shapley regression values. We demonstrate this approach on the Test Laboratory Scheduling Problem (TLSP), for which we propose a multi-objective solution algorithm and show that suggestions for weight adjustments based on the introduced explanations are successful in guiding decision makers towards solutions that match their expectations. This method is included in the TLSP MO-Explorer, a new decision support system that enables the exploration and analysis of high-dimensional Pareto fronts.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31498 Safe Learning of PDDL Domains with Conditional Effects 2024-05-30T05:52:20-07:00 Argaman Mordoch mordocha@post.bgu.ac.il Enrico Scala enrico.scala@unibs.it Roni Stern roni.stern@gmail.com Brendan Juba bjuba@wustl.edu

Powerful domain-independent planners have been developed to solve various types of planning problems. These planners often require a model of the acting agent's actions, given in some planning domain description language. Manually designing such an action model is a notoriously challenging task. An alternative is to automatically learn action models from observation. Such an action model is called safe if every plan created with it is consistent with the real, unknown action model. Algorithms for learning such safe action models exist, yet they cannot handle domains with conditional or universal effects, which are common constructs in many planning problems. We prove that learning non-trivial safe action models with conditional effects may require an exponential number of samples. Then, we identify reasonable assumptions under which such learning is tractable and propose Conditional-SAM, the first algorithm capable of doing so. We analyze Conditional-SAM theoretically and evaluate it experimentally. Our results show that the action models learned by Conditional-SAM can be used to solve perfectly most of the test set problems in most of the experimented domains.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31499 SKATE : Successive Rank-based Task Assignment for Proactive Online Planning 2024-05-30T05:52:21-07:00 Déborah Conforto Nedelmann deborah.conforto-nedelmann@isae-supaero.fr Jérôme Lacan jerome.lacan@isae-supaero.fr Caroline P. C. Chanel caroline.chanel@isae-supaero.fr

The development of online applications for services such as package delivery, crowdsourcing, or taxi dispatching has caught the attention of the research community to the domain of online multi-agent multi-task allocation. In online service applications, tasks (or requests) to be performed arrive over time and need to be dynamically assigned to agents. Such planning problems are challenging because: (i) few or almost no information about future tasks is available for long-term reasoning; (ii) agent number, as well as, task number can be impressively high; and (iii) an efficient solution has to be reached in a limited amount of time. In this paper, we propose SKATE, a successive rank-based task assignment algorithm for online multi-agent planning. SKATE can be seen as a meta-heuristic approach which successively assigns a task to the best-ranked agent until all tasks have been assigned. We assessed the complexity of SKATE and showed it is cubic in the number of agents and tasks. To investigate how multi-agent multi-task assignment algorithms perform under a high number of agents and tasks, we compare three multi-task assignment methods in synthetic and real data benchmark environments: Integer Linear Programming (ILP), Genetic Algorithm (GA), and SKATE. In addition, a proactive approach is nested to all methods to determine near-future available agents (resources) using a receding-horizon. Based on the results obtained, we can argue that the classical ILP offers the better quality solutions when treating a low number of agents and tasks, i.e. low load despite the receding-horizon size, while it struggles to respect the time constraint for high load. SKATE performs better than the other methods in high load conditions, and even better when a variable receding-horizon is used.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31500 Incremental Ordering for Scheduling Problems 2024-05-30T05:52:23-07:00 Stefan Neubert stefan.neubert@hpi.de Katrin Casel katrin.casel@hu-berlin.de

Given an instance of a scheduling problem where we want to start executing jobs as soon as possible, it is advantageous if a scheduling algorithm emits the first parts of its solution early, in particular before the algorithm completes its work. Therefore, in this position paper, we analyze core scheduling problems in regards to their enumeration complexity, i.e. the computation time to the first emitted schedule entry (preprocessing time) and the worst case time between two consecutive parts of the solution (delay). Specifically, we look at scheduling instances that reduce to ordering problems. We apply a known incremental sorting algorithm for scheduling strategies that are at their core comparison-based sorting algorithms and translate corresponding upper and lower complexity bounds to the scheduling setting. For instances with n jobs and a precedence DAG with maximum degree Δ, we incrementally build a topological ordering with O(n) preprocessing and O(Δ) delay. We prove a matching lower bound and show with an adversary argument that the delay lower bound holds even in case the DAG has constant average degree and the ordering is emitted out-of-order in the form of insert operations. We complement our theoretical results with experiments that highlight the improved time-to-first-output and discuss research opportunities for similar incremental approaches for other scheduling problems.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31501 Lookahead Pathology in Monte-Carlo Tree Search 2024-05-30T05:52:24-07:00 Khoi P. N. Nguyen khoi.nguyen6@utdallas.edu Raghuram Ramanujan raramanujan@davidson.edu

Monte-Carlo Tree Search (MCTS) is a search paradigm that first found prominence with its success in the domain of computer Go. Early theoretical work established the soundness and convergence bounds for Upper Confidence bounds applied to Trees (UCT), the most popular instantiation of MCTS; however, there remain notable gaps in our understanding of how UCT behaves in practice. In this work, we address one such gap by considering the question of whether UCT can exhibit lookahead pathology in adversarial settings --- a paradoxical phenomenon first observed in Minimax search where greater search effort leads to worse decision-making. We introduce a novel family of synthetic games that offer rich modeling possibilities while remaining amenable to mathematical analysis. Our theoretical and experimental results suggest that UCT is indeed susceptible to pathological behavior in a range of games drawn from this family.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31502 Large Language Models as Planning Domain Generators 2024-05-30T05:52:26-07:00 James Oswald jamesoswald111@gmail.com Kavitha Srinivas kavitha.srinivas@ibm.com Harsha Kokel harsha.kokel@ibm.com Junkyu Lee junkyu.lee@ibm.com Michael Katz ctpelok@gmail.com Shirin Sohrabi ssohrab@us.ibm.com

Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of domain model generation. To this end, we investigate if large language models (LLMs) can be used to generate planning domain models from simple textual descriptions. Specifically, we introduce a framework for automated evaluation of LLM-generated domains by comparing the sets of plans for domain instances. Finally, we perform an empirical analysis of 7 large language models, including coding and chat models across 9 different planning domains, and under three classes of natural language domain descriptions. Our results indicate that LLMs, particularly those with high parameter counts, exhibit a moderate level of proficiency in generating correct planning domains from natural language descriptions. Our code is available at https://github.com/IBM/NL2PDDL.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31503 On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS) 2024-05-30T05:52:27-07:00 Vishal Pallagani vishalp@mailbox.sc.edu Bharath Chandra Muppasani bharath@email.sc.edu Kaushik Roy kaushikr@email.sc.edu Francesco Fabiano ffabiano@nmsu.edu Andrea Loreggia andrea.loreggia@gmail.com Keerthiram Murugesan keerthi166@gmail.com Biplav Srivastava biplav.srivastava@gmail.com Francesca Rossi francesca.rossi2@ibm.com Lior Horesh lhoresh@us.ibm.com Amit Sheth amit@sc.edu

Automated Planning and Scheduling is among the growing areas in Artificial Intelligence (AI) where mention of LLMs has gained popularity. Based on a comprehensive review of 126 papers, this paper investigates eight categories based on the unique applications of LLMs in addressing various aspects of planning problems: language translation, plan generation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. For each category, we articulate the issues considered and existing gaps. A critical insight resulting from our review is that the true potential of LLMs unfolds when they are integrated with traditional symbolic planners, pointing towards a promising neuro-symbolic approach. This approach effectively combines the generative aspects of LLMs with the precision of classical planning methods. By synthesizing insights from existing literature, we underline the potential of this integration to address complex planning challenges. Our goal is to encourage the ICAPS community to recognize the complementary strengths of LLMs and symbolic planners, advocating for a direction in automated planning that leverages these synergistic capabilities to develop more advanced and intelligent planning systems. We aim to keep the categorization of papers updated on https://ai4society.github.io/LLM-Planning-Viz/, a collaborative resource that allows researchers to contribute and add new literature to the categorization.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31504 Transition Landmarks from Abstraction Cuts 2024-05-30T05:52:30-07:00 Florian Pommerening florian.pommerening@unibas.ch Clemens Büchner clemens.buechner@unibas.ch Thomas Keller tho.keller@unibas.ch

We introduce transition-counting constraints as a principled tool to formalize constraints that must hold in every solution of a transition system. We then show how to obtain transition landmark constraints from abstraction cuts. Transition landmarks dominate operator landmarks in theory but require solving a linear program that is prohibitively large in practice. We compare different constraints that project away transition-counting variables and then further relax the constraint. For one important special case, we provide a lossless projection. We finally discuss efficient data structures to derive cuts from abstractions and store them in a way that avoids repeated computation in every state. We compare the resulting heuristics both theoretically and on benchmarks from the international planning competition.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31505 Computing Planning Centroids and Minimum Covering States Using Symbolic Bidirectional Search 2024-05-30T05:52:31-07:00 Alberto Pozanco alberto.pozancolancho@jpmorgan.com Álvaro Torralba alto@cs.aau.dk Daniel Borrajo daniel.borrajo@jpmchase.com

In some scenarios, planning agents might be interested in reaching states that keep certain relationships with respect to a set of goals. Recently, two of these types of states were proposed: centroids, which minimize the average distance to the goals; and minimum covering states, which minimize the maximum distance to the goals. Previous approaches compute these states by searching forward either in the original or a reformulated task. In this paper, we propose several algorithms that use symbolic bidirectional search to efficiently compute centroids and minimum covering states. Experimental results in existing and novel benchmarks show that our algorithms scale much better than previous approaches, establishing a new state-of-the-art technique for this problem.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31506 SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments 2024-05-30T05:52:32-07:00 Abhinav Rajvanshi abhinav.rajvanshi@sri.com Karan Sikka karan.sikka@sri.com Xiao Lin xiao.lin@sri.com Bhoram Lee bhoram.lee@sri.com Han-Pang Chiu hchiu@sarnoff.com Alvaro Velasquez alvarovelasquezucf@gmail.com

Semantic reasoning and dynamic planning capabilities are crucial for an autonomous agent to perform complex navigation tasks in unknown environments. It requires a large amount of common-sense knowledge, that humans possess, to succeed in these tasks. We present SayNav, a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks in unknown large-scale environments. SayNav uses a novel grounding mechanism, that incrementally builds a 3D scene graph of the explored environment as inputs to LLMs, for generating feasible and contextually appropriate high-level plans for navigation. The LLM-generated plan is then executed by a pre-trained low-level planner, that treats each planned step as a short-distance point-goal navigation sub-task. SayNav dynamically generates step-by-step instructions during navigation and continuously refines future steps based on newly perceived information. We evaluate SayNav on multi-object navigation (MultiON) task, that requires the agent to utilize a massive amount of human knowledge to efficiently search multiple different objects in an unknown environment. We also introduce a benchmark dataset for MultiON task employing ProcTHOR framework that provides large photo-realistic indoor environments with variety of objects. SayNav achieves state-of-the-art results and even outperforms an oracle based baseline with strong ground-truth assumptions by more than 8% in terms of success rate, highlighting its ability to generate dynamic plans for successfully locating objects in large-scale new environments. The code, benchmark dataset and demonstration videos are accessible at https://www.sri.com/ics/computer-vision/saynav.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31507 Online Control of Adaptive Large Neighborhood Search Using Deep Reinforcement Learning 2024-05-30T05:52:35-07:00 Robbert Reijnen r.v.j.reijnen@tue.nl Yingqian Zhang yqzhang@tue.nl Hoong Chuin Lau hclau@smu.edu.sg Zaharah Bukhsh z.bukhsh@tue.nl

The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. The proposed method aims to learn, based on the state of the search, to configure ALNS for the next iteration to yield more effective solutions for the given optimization problem. We evaluate the proposed method on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition, achieving this with significantly fewer training observations. Furthermore, we demonstrate several good properties of the proposed DR-ALNS method: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31508 Map Connectivity and Empirical Hardness of Grid-based Multi-Agent Pathfinding Problem 2024-05-30T05:52:36-07:00 Jingyao Ren jingyaor@usc.edu Eric Ewing eric_ewing@brown.edu T. K. Satish Kumar tkskwork@gmail.com Sven Koenig skoenig@usc.edu Nora Ayanian nora_ayanian@brown.edu

We present an empirical study of the relationship between map connectivity and the empirical hardness of the multi-agent pathfinding (MAPF) problem. By analyzing the second smallest eigenvalue (commonly known as lambda2) of the normalized Laplacian matrix of different maps, our initial study indicates that maps with smaller lambda2 tend to create more challenging instances when agents are generated uniformly randomly. Additionally, we introduce a map generator based on Quality Diversity (QD) that is capable of producing maps with specified lambda2 ranges, offering a possible way for generating challenging MAPF instances. Despite the absence of a strict monotonic correlation with lambda2 and the empirical hardness of MAPF, this study serves as a valuable initial investigation for gaining a deeper understanding of what makes a MAPF instance hard to solve.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31509 The Story So Far on Narrative Planning 2024-05-30T05:52:37-07:00 Rogelio E. Cardona Rivera r.cardona.rivera@utah.edu Arnav Jhala ahjhala@ncsu.edu Julie Porteous julie.porteous@rmit.edu.au R. Michael Young rmichael.young@utah.edu

Narrative planning is the use of automated planning to construct, communicate, and understand stories, a form of information to which human cognition and enaction is pre-disposed. We review the narrative planning problem in a manner suitable as an introduction to the area, survey different plan-based methodologies and affordances for reasoning about narrative, and discuss open challenges relevant to the broader AI community.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31510 Learning General Policies for Planning through GPT Models 2024-05-30T05:52:38-07:00 Nicholas Rossetti nicholas.rossetti@unibs.it Massimiliano Tummolo massimiliano.tummolo@uniroma1.it Alfonso Emilio Gerevini alfonso.gerevini@unibs.it Luca Putelli luca.putelli@unibs.it Ivan Serina ivan.serina@unibs.it Mattia Chiari mattia.chiari@unibs.it Matteo Olivato matteo.olivato@unibs.it

Transformer-based architectures, such as T5, BERT and GPT, have demonstrated revolutionary capabilities in Natural Language Processing. Several studies showed that deep learning models using these architectures not only possess remarkable linguistic knowledge, but they also exhibit forms of factual knowledge, common sense, and even programming skills. However, the scientific community still debates about their reasoning capabilities, which have been recently tested in the context of automated AI planning; the literature presents mixed results, and the prevailing view is that current transformer-based models may not be adequate for planning. In this paper, we address this challenge differently. We introduce a GPT-based model customised for planning (PLANGPT) to learn a general policy for classical planning by training the model from scratch with a dataset of solved planning instances. Once PLANGPT has been trained for a domain, it can be used to generate a solution plan for an input problem instance in that domain. Our training procedure exploits automated planning knowledge to enhance the performance of the trained model. We build and evaluate our GPT model with several planning domains, and we compare its performance w.r.t. other recent deep learning techniques for generalised planning, demonstrating the effectiveness of the proposed approach.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31511 Efficiently Computing Transitions in Cartesian Abstractions 2024-05-30T05:52:40-07:00 Jendrik Seipp jendrik.seipp@liu.se

Counterexample-guided Cartesian abstraction refinement yields strong heuristics for optimal classical planning. The approach iteratively finds a new abstract solution, checks where it fails for the original task and refines the abstraction to avoid the same failure in subsequent iterations. The main bottleneck of this refinement loop is the memory needed for storing all abstract transitions. To address this issue, we introduce an algorithm that efficiently computes abstract transitions on demand. This drastically reduces the memory consumption and allows us to solve tasks during the refinement loop and during the search that were previously out of reach.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31512 Imitating Cost-Constrained Behaviors in Reinforcement Learning 2024-05-30T05:52:41-07:00 Qian Shao qianshao.2020@phdcs.smu.edu.sg Pradeep Varakantham pradeepv@smu.edu.sg Shih-Fen Cheng sfcheng@smu.edu.sg

Complex planning and scheduling problems have long been solved using various optimization or heuristic approaches. In recent years, imitation learning that aims to learn from expert demonstrations has been proposed as a viable alternative to solving these problems. Generally speaking, imitation learning is designed to learn either the reward (or preference) model or directly the behavioral policy by observing the behavior of an expert. Existing work in imitation learning and inverse reinforcement learning has focused on imitation primarily in unconstrained settings (e.g., no limit on fuel consumed by the vehicle). However, in many real-world domains, the behavior of an expert is governed not only by reward (or preference) but also by constraints. For instance, decisions on self-driving delivery vehicles are dependent not only on the route preferences/rewards (depending on past demand data) but also on the fuel in the vehicle and the time available. In such problems, imitation learning is challenging as decisions are not only dictated by the reward model but are also dependent on a cost-constrained model. In this paper, we provide multiple methods that match expert distributions in the presence of trajectory cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to find a good trade-off between expected return and minimizing constraint violation; and (c) Cost-violation-based alternating gradient. We empirically show that leading imitation learning approaches imitate cost-constrained behaviors poorly and our meta-gradient-based approach achieves the best performance.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31513 Accelerating Search-Based Planning for Multi-Robot Manipulation by Leveraging Online-Generated Experiences 2024-05-30T05:52:42-07:00 Yorai Shaoul yshaoul@andrew.cmu.edu Itamar Mishani imishani@andrew.cmu.edu Maxim Likhachev maxim@cs.cmu.edu Jiaoyang Li jiaoyangli@cmu.edu

An exciting frontier in robotic manipulation is the use of multiple arms at once. However, planning concurrent motions is a challenging task using current methods. The high-dimensional composite state space renders many well-known motion planning algorithms intractable. Recently, Multi-Agent Path Finding (MAPF) algorithms have shown promise in discrete 2D domains, providing rigorous guarantees. However, widely used conflict-based methods in MAPF assume an efficient single-agent motion planner. This poses challenges in adapting them to manipulation cases where this assumption does not hold, due to the high dimensionality of configuration spaces and the computational bottlenecks associated with collision checking. To this end, we propose an approach for accelerating conflict-based search algorithms by leveraging their repetitive and incremental nature -- making them tractable for use in complex scenarios involving multi-arm coordination in obstacle-laden environments. We show that our method preserves completeness and bounded sub-optimality guarantees, and demonstrate its practical efficacy through a set of experiments with up to 10 robotic arms.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31514 Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents 2024-05-30T05:52:45-07:00 Yash Shukla yash.shukla@tufts.edu Tanushree Burman tanushree.burman@tufts.edu Abhishek N. Kulkarni abhishek.nkulkarni21@gmail.com Robert Wright robert.wright@gtri.gatech.edu Alvaro Velasquez alvarovelasquezucf@gmail.com Jivko Sinapov jivko.sinapov@tufts.edu

Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTLf) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31515 Merging or Computing Saturated Cost Partitionings? A Merge Strategy for the Merge-and-Shrink Framework 2024-05-30T05:52:46-07:00 Silvan Sievers silvan.sievers@unibas.ch Thomas Keller tho.keller@unibas.ch Gabriele Röger gabriele.roeger@unibas.ch

The merge-and-shrink framework is a powerful tool for computing abstraction heuristics for optimal classical planning. Merging is one of its name-giving transformations. It entails computing the product of two factors of a factored transition system. To decide which two factors to merge, the framework uses a merge strategy. While there exist many merge strategies, it is generally unclear what constitutes a strong merge strategy, and a previous analysis shows that there is still lots of room for improvement with existing merge strategies. In this paper, we devise a new scoring function for score-based merge strategies based on answering the question whether merging two factors has any benefits over computing saturated cost partitioning heuristics over the factors instead. Our experimental evaluation shows that our new merge strategy achieves state-of-the-art performance on IPC benchmarks.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31516 Decoupled Search for the Masses: A Novel Task Transformation for Classical Planning 2024-05-30T05:52:47-07:00 David Speck david.speck@liu.se Daniel Gnad daniel.gnad@liu.se

Automated problem reformulation is a common technique in classical planning to identify and exploit problem structures. Decoupled search is an approach that automatically decomposes planning tasks based on their causal structure, often significantly reducing the search effort. However, its broad applicability is limited by the need for specialized algorithms. In this paper, we present an approach that embodies decoupled search for non-optimal planning through a novel task transformation. Specifically, given a task and a decomposition, we create a transformed task such that the state space of the transformed task is isomorphic to that of decoupled search on the original task. This eliminates the need for specialized algorithms and allows the use of various planning technology in the decoupled-search framework. Empirical evaluation shows that our method is empirically competitive with specialized decoupled algorithms and favorable to other related problem reformulation techniques.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31517 Explaining the Space of SSP Policies via Policy-Property Dependencies: Complexity, Algorithms, and Relation to Multi-Objective Planning 2024-05-30T05:52:48-07:00 Marcel Steinmetz marcel.steinmetz@laas.fr Sylvie Thiébaux sylvie.thiebaux@anu.edu.au Daniel Höller hoeller@cs.uni-saarland.de Florent Teichteil-Königsbuch florent.teichteil-koenigsbuch@airbus.com

Stochastic shortest path (SSP) problems are a common framework for planning under uncertainty. However, the reactive structure of their solution policies is typically not easily comprehensible by an end-user, nor do planners justify the reasons behind their choice of a particular policy over others. To strengthen confidence in the planner's decision-making, recent work in classical planning has introduced a framework for explaining to the user the possible solution space in terms of necessary trade-offs between user-provided plan properties. Here, we extend this framework to SSPs. We introduce a notion of policy properties taking into account action-outcome uncertainty. We analyze formally the computational problem of identifying the exclusion relationships between policy properties, showing that this problem is in fact harder than SSP planning in a complexity theoretical sense. We show that all the relationships can be identified through a series of heuristic searches, which, if ordered in a clever way, yields an anytime algorithm. Further, we introduce an alternative method, which leverages a connection to multi-objective probabilistic planning to move all the computational burden to a preprocessing step. Finally, we explore empirically the feasibility of the proposed explanation methodology on a range of adapted IPPC benchmarks.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31518 Addressing Myopic Constrained POMDP Planning with Recursive Dual Ascent 2024-05-30T05:52:50-07:00 Paula Stocco stoccop@stanford.edu Suhas Chundi chundi72@stanford.edu Arec Jamgochian arec@stanford.edu Mykel J. Kochenderfer mykel@stanford.edu

Lagrangian-guided Monte Carlo tree search with global dual ascent has been applied to solve large constrained partially observable Markov decision processes (CPOMDPs) online. In this work, we demonstrate that these global dual parameters can lead to myopic action selection during exploration, ultimately leading to suboptimal decision making. To address this, we introduce history-dependent dual variables that guide local action selection and are optimized with recursive dual ascent. We empirically compare the performance of our approach on a motivating toy example and two large CPOMDPs, demonstrating improved exploration, and ultimately, safer outcomes.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31519 Robust Multi-Agent Pathfinding with Continuous Time 2024-05-30T05:52:51-07:00 Wen Jun Tan wjtan@ntu.edu.sg Xueyan Tang asxytang@ntu.edu.sg Wentong Cai aswtcai@ntu.edu.sg

Multi-Agent Pathfinding (MAPF) is the problem of finding plans for multiple agents such that every agent moves from its start location to its goal location without collisions. If unexpected events delay some agents during plan execution, it may not be possible for the agents to continue following their plans without causing any collision. We define and solve a T-robust MAPF problem that seeks plans that can be followed even if some delays occur, under the generalized MAPFR setting with continuous time notions. The proposed approach is complete and provides provably optimal solutions. We also develop an exact method for collision detection among agents that can be delayed. We experimentally evaluate our proposed approach in terms of efficiency and plan cost.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31520 Multi-Robot Connected Fermat Spiral Coverage 2024-05-30T05:52:52-07:00 Jingtao Tang todd.j.tang@gmail.com Hang Ma hangma@sfu.ca

We introduce Multi-Robot Connected Fermat Spiral (MCFS), a novel algorithmic framework for Multi-Robot Coverage Path Planning (MCPP) that adapts Coverage Fermat Spiral (CFS) from the computer graphics community to multi-robot coordination for the first time. MCFS uniquely enables the orchestration of multiple robots to generate coverage paths that contour around arbitrarily shaped obstacles, a feature notably lacking in traditional methods. Our framework not only enhances area coverage and optimizes task performance, particularly in terms of makespan, for workspaces rich in irregular obstacles but also addresses the challenges of path continuity and curvature critical for non-holonomic robots by generating smooth paths without decomposing the workspace. MCFS solves MCPP by constructing a graph of isolines and transforming MCPP into a combinatorial optimization problem, aiming to minimize the makespan while covering all vertices. Our contributions include developing a unified CFS version for scalable and adaptable MCPP, extending it to MCPP with novel optimization techniques for cost reduction and path continuity and smoothness, and demonstrating through extensive experiments that MCFS outperforms existing MCPP methods in makespan, path curvature, coverage ratio, and overlapping ratio. Our research marks a significant step in MCPP, showcasing the fusion of computer graphics and automated planning principles to advance the capabilities of multi-robot systems in complex environments. Our code is publicly available at https://github.com/reso1/MCFS.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31521 Optimal Infinite Temporal Planning: Cyclic Plans for Priced Timed Automata 2024-05-30T05:52:53-07:00 Rasmus G. Tollund rasmusgtollund@gmail.com Nicklas S. Johansen nslorup@gmail.com Kristian Ø. Nielsen kgl@cs.aau.dk Álvaro Torralba alto@cs.aau.dk Kim G. Larsen kristianodum@gmail.com

Many applications require infinite plans ---i.e. an infinite sequence of actions--- in order to carry out some given process indefinitely. In addition, it is desirable to guarantee optimality. In this paper, we address this problem in the setting of doubly-priced timed automata, where we show how to efficiently compute ratio-optimal cycles for optimal infinite plans. For efficient computation, we present symbolic λ-deduction (S-λD), an any-time algorithm that uses a symbolic representation (priced zones) to search the state-space with a compact representation of the time constraints. Our approach guarantees termination while arriving at an optimal solution. Our experimental evaluation shows that S-λD outperforms the alternative of searching in the concrete state space; is very robust with respect to fine-grained temporal constraints; and has a very good anytime behaviour.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31522 Improving Learnt Local MAPF Policies with Heuristic Search 2024-05-30T05:52:55-07:00 Rishi Veerapaneni rveerapa@andrew.cmu.edu Qian Wang pwang649@usc.edu Kevin Ren kevinren@andrew.cmu.edu Arthur Jakobsson ajakobss@andrew.cmu.edu Jiaoyang Li jiaoyangli@cmu.edu Maxim Likhachev maxim@cs.cmu.edu

Multi-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent are appealing as these could enable decentralized systems and scale well while maintaining good solution quality. Current ML approaches to MAPF have proposed methods that have started to scratch the surface of this potential. However, state-of-the-art ML approaches produce ``local" policies that only plan for a single timestep and have poor success rates and scalability. Our main idea is that we can improve a ML local policy by using heuristic search methods on the output probability distribution to resolve deadlocks and enable full horizon planning. We show several model-agnostic ways to use heuristic search with learnt policies that significantly improve the policies' success rates and scalability. To our best knowledge, we demonstrate the first time ML-based MAPF approaches have scaled to high congestion scenarios (e.g. 20% agent density).

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31523 Neural Action Policy Safety Verification: Applicablity Filtering 2024-05-30T05:52:56-07:00 Marcel Vinzent vinzent@cs.uni-saarland.de Jörg Hoffmann hoffmann@cs.uni-saarland.de

Neural networks (NN) are an increasingly important representation of action policies pi. Applicability filtering is a commonly used practice in this context, restricting the action selection in pi to only applicable actions. Policy predicate abstraction (PPA) has recently been introduced to verify safety of neural pi, through over-approximating the state space subgraph induced by pi. Thus far however, PPA does not permit applicability filtering, which is challenging due to the additional constraints that need to be taken into account. Here we overcome that limitation, through a range of algorithmic enhancements. In our experiments, our enhancements achieve several orders of magnitude speed-up over a baseline implementation, bringing PPA with applicability filtering close to the performance of PPA without such filtering.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31524 Efficient Approximate Search for Multi-Objective Multi-Agent Path Finding 2024-05-30T05:52:58-07:00 Fangji Wang wang-fj20@mails.tsinghua.edu.cn Han Zhang zhan645@usc.edu Sven Koenig skoenig@usc.edu Jiaoyang Li jiaoyangli@cmu.edu

The Multi-Objective Multi-Agent Path Finding (MO-MAPF) problem is the problem of computing collision-free paths for a team of agents while minimizing multiple cost metrics. Most existing MO-MAPF algorithms aim to compute the Pareto frontier. However, the Pareto frontier can be time-consuming to compute. Our first main contribution is BB-MO-CBS-pex, an approximate MO-MAPF algorithm that computes an approximate frontier for a user-specific approximation factor. BB-MO-CBS-pex builds upon BB-MO-CBS, a state-of-the-art MO-MAPF algorithm, and leverages A*pex, a state-of-the-art single-agent multi-objective search algorithm, to speed up different parts of BB-MO-CBS. We also provide two speed-up techniques for BB-MO-CBS-pex. Our second main contribution is BB-MO-CBS-k, which builds upon BB-MO-CBS-pex and computes up to k solutions for a user-provided k-value. BB-MO-CBS-k is useful when it is unclear how to determine an appropriate approximation factor. Our experimental results show that both BB-MO-CBS-pex and BB-MO-CBS-k solved significantly more instances than BB-MO-CBS for different approximation factors and k-values, respectively. Additionally, we compare BB-MO-CBS-pex with an approximate baseline algorithm derived from BB-MO-CBS and show that BB-MO-CBS-pex achieved speed-ups up to two orders of magnitude.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31525 MAPF in 3D Warehouses: Dataset and Analysis 2024-05-30T05:53:00-07:00 Qian Wang pwang649@usc.edu Rishi Veerapaneni rveerapa@andrew.cmu.edu Yu Wu yuwu3@andrew.cmu.edu Jiaoyang Li jiaoyangli@cmu.edu Maxim Likhachev maxim@cs.cmu.edu

Recent works have made significant progress in multi-agent path finding (MAPF), with modern methods being able to scale to hundreds of agents, handle unexpected delays, work in groups, etc. The vast majority of these methods have focused on 2D "grid world" domains. However, modern warehouses often utilize multi-agent robotic systems that can move in 3D, enabling dense storage but resulting in a more complex multi-agent planning problem. Motivated by this, we introduce and experimentally analyze the application of MAPF to 3D warehouse management, and release the first (see http://mapf.info/index.php/Main/Benchmarks) open-source 3D MAPF dataset. We benchmark two state-of-the-art MAPF methods, EECBS and MAPF-LNS2, and show how different hyper-parameters affect these methods across various 3D MAPF problems. We also investigate how the warehouse structure itself affects MAPF performance. Based on our experimental analysis, we find that a fast low-level search is critical for 3D MAPF, EECBS's suboptimality significantly changes the effect of certain CBS techniques, and certain warehouse designs can noticeably influence MAPF scalability and speed. An additional important observation is that, overall, the tested 2D MAPF techniques scaled well to 3D warehouses and demonstrate how the MAPF community's progress in 2D can generalize to 3D warehouses.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31526 Learning Generalised Policies for Numeric Planning 2024-05-30T05:53:01-07:00 Ryan Xiao Wang ryan.wang@anu.edu.au Sylvie Thiébaux sylvie.thiebaux@anu.edu.au

We extend Action Schema Networks (ASNets) to learn generalised policies for numeric planning, which features quantitative numeric state variables, preconditions and effects. We propose a neural network architecture that can reason about the numeric variables both directly and in context of other variables. We also develop a dynamic exploration algorithm for more efficient training, by better balancing the exploration versus learning tradeoff to account for the greater computational demand of numeric teacher planners. Experimentally, we find that the learned generalised policies are capable of outperforming traditional numeric planners on some domains, and the dynamic exploration algorithm to be on average much faster at learning effective generalised policies than the original ASNets training algorithm.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31527 Tightest Admissible Shortest Path 2024-05-30T05:53:03-07:00 Eyal Weiss eyal.weiss@biu.ac.il Ariel Felner felner@bgu.ac.il Gal A. Kaminka galk@cs.biu.ac.il

The shortest path problem in graphs is fundamental to AI. Nearly all variants of the problem and relevant algorithms that solve them ignore edge-weight computation time and its common relation to weight uncertainty. This implies that taking these factors into consideration can potentially lead to a performance boost in relevant applications. Recently, a generalized framework for weighted directed graphs was suggested, where edge-weight can be computed (estimated) multiple times, at increasing accuracy and run-time expense. We build on this framework to introduce the problem of finding the tightest admissible shortest path (TASP); a path with the tightest suboptimality bound on the optimal cost. This is a generalization of the shortest path problem to bounded uncertainty, where edge-weight uncertainty can be traded for computational cost. We present a complete algorithm for solving TASP, with guarantees on solution quality. Empirical evaluation supports the effectiveness of this approach.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31528 Neuro-Symbolic Learning of Lifted Action Models from Visual Traces 2024-05-30T05:53:04-07:00 Kai Xi oliver.xi@anu.edu.au Stephen Gould stephen.gould@anu.edu.au Sylvie Thiébaux sylvie.thiebaux@anu.edu.au

Model-based planners rely on action models to describe available actions in terms of their preconditions and effects. Nonetheless, manually encoding such models is challenging, especially in complex domains. Numerous methods have been proposed to learn action models from examples of plan execution traces. However, high-level information, such as state labels within traces, is often unavailable and needs to be inferred indirectly from raw observations. In this paper, we aim to learn lifted action models from visual traces --- sequences of image-action pairs depicting discrete successive trace steps. We present ROSAME, a differentiable neuRO-Symbolic Action Model lEarner that infers action models from traces consisting of probabilistic state predictions and actions. By combining ROSAME with a deep learning computer vision model, we create an end-to-end framework that jointly learns state predictions from images and infers symbolic action models. Experimental results demonstrate that our method succeeds in both tasks, using different visual state representations, with the learned action models often matching or even surpassing those created by humans.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31529 Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach 2024-05-30T05:53:05-07:00 Zhiyuan Yao zyao9@stevens.edu Ionut Florescu ifloresc@stevens.edu Chihoon Lee clee4@stevens.edu

In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31530 Contrastive Explanations of Centralized Multi-agent Optimization Solutions 2024-05-30T05:53:06-07:00 Parisa Zehtabi parisa.zehtabi@jpmorgan.com Alberto Pozanco alberto.pozancolancho@jpmorgan.com Ayala Bolch ayalabl@shikumil.org.il Daniel Borrajo daniel.borrajo@jpmchase.com Sarit Kraus sarit@cs.biu.ac.il

In many real-world scenarios, agents are involved in optimization problems. Since most of these scenarios are over-constrained, optimal solutions do not always satisfy all agents. Some agents might be unhappy and ask questions of the form “Why does solution S not satisfy property P ?”. We propose CMAOE, a domain-independent approach to obtain contrastive explanations by: (i) generating a new solution S′ where property P is enforced, while also minimizing the differences between S and S′; and (ii) highlighting the differences between the two solutions, with respect to the features of the objective function of the multi-agent system. Such explanations aim to help agents understanding why the initial solution is better in the context of the multi-agent system than what they expected. We have carried out a computational evaluation that shows that CMAOE can generate contrastive explanations for large multi-agent optimization problems. We have also performed an extensive user study in four different domains that shows that: (i) after being presented with these explanations, humans’ satisfaction with the original solution increases; and (ii) the constrastive explanations generated by CMAOE are preferred or equally preferred by humans over the ones generated by state of the art approaches.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31531 Bounded-Suboptimal Weight-Constrained Shortest-Path Search via Efficient Representation of Paths 2024-05-30T05:53:07-07:00 Han Zhang zhan645@usc.edu Oren Salzman salzman.oren@gmail.com Ariel Felner felner@bgu.ac.il T. K. Satish Kumar tkskwork@gmail.com Sven Koenig skoenig@usc.edu

In the Weight-Constrained Shortest-Path (WCSP) problem, given a graph in which each edge is annotated with a cost and a weight, a start state, and a goal state, the task is to compute a minimum-cost path from the start state to the goal state with weight no larger than a given weight limit. While most existing works have focused on solving the WCSP problem optimally, many real-world situations admit a trade-off between efficiency and a suboptimality bound for the path cost. In this paper, we propose the bounded-suboptimal WCSP algorithm WC-A*pex, which is built on the state-of-the-art approximate bi-objective search algorithm A*pex. WC-A*pex uses an approximate representation of paths with similar costs and weights to compute a (1+ε)-suboptimal path, for a given ε. During its search, WC-A*pex avoids storing all paths explicitly and thereby reduces the search effort while still retaining its (1 + ε)-suboptimality bound. On benchmark road networks, our experimental results show that WC-A*pex with ε = 0.01 (i.e., with a guaranteed suboptimality of at most 1%) achieves a speed-up of up to an order of magnitude over WC-A*, a state-of-the-art WCSP algorithm, and its bounded-suboptimal variant.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31532 A Counter-Example Based Approach to Probabilistic Conformant Planning 2024-05-30T05:53:09-07:00 Xiaodi Zhang xiaodi.zhang@anu.edu.au Alban Grastien alban.grastien@cea.fr Charles Gretton charles.gretton@gmail.com

This paper introduces a counter-example based approach for solving probabilistic conformant planning (PCP) problems. Our algorithm incrementally generates candidate plans and identifies counter-examples until it finds a plan for which the probability of success is above the specified threshold. We prove that the algorithm is sound and complete. We further propose a variation of our algorithm that uses hitting sets to accelerate the generation of candidate plans. Experimental results show that our planner is particularly suited for problems with a high probability threshold.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31533 Improving the Efficiency and Efficacy of Multi-Agent Reinforcement Learning on Complex Railway Networks with a Local-Critic Approach 2024-05-30T05:53:10-07:00 Yuan Zhang yzhang@cs.uni-freiburg.de Umashankar Deekshith umashankar.deekshith@deutschebahn.com Jianhong Wang jianhong.wang@manchester.ac.uk Joschka Boedecker jboedeck@informatik.uni-freiburg.de

The complex railway network is a challenging real-world multi-agent system usually involving thousands of agents. Current planning methods heavily depend on expert knowledge to formulate solutions for specific cases and are therefore hardly generalized to new scenarios, on which multi-agent reinforcement learning (MARL) draws significant attention. Despite some successful applications in multi-agent decision-making tasks, MARL is hard to scale to a large number of agents. This paper rethinks the curse of agents in the centralized-training-decentralized-execution (CTDE) paradigm and proposes a local-critic approach to address the issue. By combining the local critic with the PPO algorithm, we design a deep MARL algorithm denoted as local-critic PPO (LCPPO). In experiments, we evaluate the effectiveness of LCPPO on a complex railway network benchmark, Flatland, with various numbers of agents. Noticeably, LCPPO shows prominent generalizability and robustness under the changes of environments.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31534 Planning and Execution in Multi-Agent Path Finding: Models and Algorithms 2024-05-30T05:53:14-07:00 Yue Zhang yue.zhang@monash.edu Zhe Chen zhe.chen@monash.edu Daniel Harabor daniel.harabor@monash.edu Pierre Le Bodic pierre.lebodic@monash.edu Peter J. Stuckey peter.stuckey@monash.edu

In applications of Multi-Agent Path Finding (MAPF), it is often the sum of planning and execution times that needs to be minimised (i.e., the Goal Achievement Time). Yet current methods seldom optimise for this objective. Optimal algorithms reduce execution time, but may require exponential planning time. Non-optimal algorithms reduce planning time, but at the expense of increased path length. To address these limitations we introduce PIE (Planning and Improving while Executing), a new framework for concurrent planning and execution in MAPF. We show how different instantiations of PIE affect practical performance, including initial planning time, action commitment time and concurrent vs. sequential planning and execution. We then adapt PIE to Lifelong MAPF, a popular application setting where agents are continuously assigned new goals and where additional decisions are required to ensure feasibility. We examine a variety of different approaches to overcome these challenges and we conduct comparative experiments vs. recently proposed alternatives. Results show that PIE substantially outperforms existing methods for One-shot and Lifelong MAPF.

2024-05-30T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/ICAPS/article/view/31535 Decentralized, Decomposition-Based Observation Scheduling for a Large-Scale Satellite Constellation 2024-05-30T05:53:15-07:00 Itai Zilberstein itai.m.zilberstein@jpl.nasa.gov Ananya Rao ananyara@andrew.cmu.edu Matthew Salis matthew.salis@jpl.nasa.gov Steve Chien steve.a.chien@jpl.nasa.gov

Deploying multi-satellite constellations for Earth observation requires coordinating potentially hundreds of spacecraft. With increasing on-board capability for autonomy, we can view the constellation as a multi-agent system (MAS) and employ decentralized scheduling solutions. We formulate the problem as a distributed constraint optimization problem (DCOP) and desire scalable inter-agent communication. The problem consists of millions of variables which, coupled with the structure, make existing DCOP algorithms inadequate for this application. We develop a scheduling approach that employs a well-coordinated heuristic, referred to as the Geometric Neighborhood Decomposition (GND) heuristic, to decompose the global DCOP into sub-problems as to enable the application of DCOP algorithms. We present the Neighborhood Stochastic Search (NSS) algorithm, a decentralized algorithm to effectively solve the multi-satellite constellation observation scheduling problem using decomposition. In full, we identify the roadblocks of deploying DCOP solvers to a large-scale, real-world problem, propose a decomposition-based scheduling approach that is effective at tackling large scale DCOPs, empirically evaluate the approach against other baseline algorithms to demonstrate the effectiveness, and discuss the generality of the approach.