Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

Zengyu Zou; Jingyuan Wang; Yixuan Huang; Junjie Wu

doi:10.1609/aaai.v40i19.38700

Authors

Zengyu Zou School of Computer Science and Engineering, Beihang University, Beijing, China MOE Engineering Research Center of Advanced Computer Application Technology, Beihang University, China
Jingyuan Wang School of Computer Science and Engineering, Beihang University, Beijing, China School of Economics and Management, Beihang University, Beijing, China MIIT Key Laboratory of Data Intelligence and Management, Beihang University, Beijing, China MOE Engineering Research Center of Advanced Computer Application Technology, Beihang University, China
Yixuan Huang School of Computer Science and Engineering, Beihang University, Beijing, China MOE Engineering Research Center of Advanced Computer Application Technology, Beihang University, China
Junjie Wu School of Economics and Management, Beihang University, Beijing, China MIIT Key Laboratory of Data Intelligence and Management, Beihang University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i19.38700

Abstract

This paper addresses the cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR) and proposes an end-to-end centralized decision-making framework based on sequence-to-sequence, named Multi-Agent Pointer Transformer (MAPT). MVDPDPSR is an extension of the vehicle routing problem and a spatio-temporal system optimization problem, widely applied in scenarios such as on-demand delivery. Classical operations research methods face bottlenecks in computational complexity and time efficiency when handling large-scale dynamic problems. Although existing reinforcement learning methods have achieved some progress, they still encounter several challenges: 1) Independent decoding across multiple vehicles fails to model joint action distributions; 2) The feature extraction network struggles to capture inter-entity relationships; 3) The joint action space is exponentially large. To address these issues, we designed the MAPT framework, which employs a Transformer Encoder to extract entity representations, combines a Transformer Decoder with a Pointer Network to generate joint action sequences in an AutoRegressive manner, and introduces a Relation-Aware Attention module to capture inter-entity relationships. Additionally, we guide the model's decision-making using informative priors to facilitate effective exploration. Experiments on 8 datasets demonstrate that MAPT significantly outperforms existing baseline methods in terms of performance and exhibits substantial computational time advantages compared to classical operations research methods.

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information