Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

Authors

  • Zengyu Zou School of Computer Science and Engineering, Beihang University, Beijing, China MOE Engineering Research Center of Advanced Computer Application Technology, Beihang University, China
  • Jingyuan Wang School of Computer Science and Engineering, Beihang University, Beijing, China School of Economics and Management, Beihang University, Beijing, China MIIT Key Laboratory of Data Intelligence and Management, Beihang University, Beijing, China MOE Engineering Research Center of Advanced Computer Application Technology, Beihang University, China
  • Yixuan Huang School of Computer Science and Engineering, Beihang University, Beijing, China MOE Engineering Research Center of Advanced Computer Application Technology, Beihang University, China
  • Junjie Wu School of Economics and Management, Beihang University, Beijing, China MIIT Key Laboratory of Data Intelligence and Management, Beihang University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i19.38700

Abstract

This paper addresses the cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR) and proposes an end-to-end centralized decision-making framework based on sequence-to-sequence, named Multi-Agent Pointer Transformer (MAPT). MVDPDPSR is an extension of the vehicle routing problem and a spatio-temporal system optimization problem, widely applied in scenarios such as on-demand delivery. Classical operations research methods face bottlenecks in computational complexity and time efficiency when handling large-scale dynamic problems. Although existing reinforcement learning methods have achieved some progress, they still encounter several challenges: 1) Independent decoding across multiple vehicles fails to model joint action distributions; 2) The feature extraction network struggles to capture inter-entity relationships; 3) The joint action space is exponentially large. To address these issues, we designed the MAPT framework, which employs a Transformer Encoder to extract entity representations, combines a Transformer Decoder with a Pointer Network to generate joint action sequences in an AutoRegressive manner, and introduces a Relation-Aware Attention module to capture inter-entity relationships. Additionally, we guide the model's decision-making using informative priors to facilitate effective exploration. Experiments on 8 datasets demonstrate that MAPT significantly outperforms existing baseline methods in terms of performance and exhibits substantial computational time advantages compared to classical operations research methods.

Published

2026-03-14

How to Cite

Zou, Z., Wang, J., Huang, Y., & Wu, J. (2026). Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16593–16601. https://doi.org/10.1609/aaai.v40i19.38700

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III