Truncated Counterfactual Learning for Anytime Multi-Agent Path Finding

Authors

  • Thomy Phan University of Bayreuth, Germany
  • Shao-Hung Chan University of Southern California, USA
  • Sven Koenig University of California, Irvine, USA Örebro University, Sweden

DOI:

https://doi.org/10.1609/aaai.v40i35.40207

Abstract

Anytime multi-agent path finding (MAPF) is a promising approach to scalable and collision-free path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths, i.e., a neighborhood, of the solution. Delay-based MAPF-LNS has demonstrated particular effectiveness in generating promising neighborhoods via seed agents, according to their delays. Seed agents are selected using handcrafted strategies or online learning, where the former relies on human intuition about underlying structures, while the latter conducts black-box optimization, ignoring any structure. In this paper, we propose Truncated Adaptive Counterfactual K-ranked LEarning (TACKLE) to select seed agents via informed online learning by leveraging handcrafted strategies as human intuition. We show theoretically that TACKLE dominates its handcrafted and black-box learning counterparts in the limit. Our experiments demonstrate cost improvements of at least 60% in instances with one thousand agents, compared with state-of-the-art anytime solvers.

Downloads

Published

2026-03-14

How to Cite

Phan, T., Chan, S.-H., & Koenig, S. (2026). Truncated Counterfactual Learning for Anytime Multi-Agent Path Finding. Proceedings of the AAAI Conference on Artificial Intelligence, 40(35), 29633-29641. https://doi.org/10.1609/aaai.v40i35.40207

Issue

Section

AAAI Technical Track on Multiagent Systems