Truncated Counterfactual Learning for Anytime Multi-Agent Path Finding

Thomy Phan; Shao-Hung Chan; Sven Koenig

doi:10.1609/aaai.v40i35.40207

Authors

Thomy Phan University of Bayreuth, Germany
Shao-Hung Chan University of Southern California, USA
Sven Koenig University of California, Irvine, USA Örebro University, Sweden

DOI:

https://doi.org/10.1609/aaai.v40i35.40207

Abstract

Anytime multi-agent path finding (MAPF) is a promising approach to scalable and collision-free path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths, i.e., a neighborhood, of the solution. Delay-based MAPF-LNS has demonstrated particular effectiveness in generating promising neighborhoods via seed agents, according to their delays. Seed agents are selected using handcrafted strategies or online learning, where the former relies on human intuition about underlying structures, while the latter conducts black-box optimization, ignoring any structure. In this paper, we propose Truncated Adaptive Counterfactual K-ranked LEarning (TACKLE) to select seed agents via informed online learning by leveraging handcrafted strategies as human intuition. We show theoretically that TACKLE dominates its handcrafted and black-box learning counterparts in the limit. Our experiments demonstrate cost improvements of at least 60% in instances with one thousand agents, compared with state-of-the-art anytime solvers.

Truncated Counterfactual Learning for Anytime Multi-Agent Path Finding

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information