NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching

Authors

  • Hongbo Zhang University of Science and Technology of China
  • Guang Wang Florida State University
  • Xu Wang University of Science and Technology of China
  • Zhengyang Zhou University of Science and Technology of China
  • Chen Zhang University of Science and Technology of China
  • Zheng Dong Wayne State University
  • Yang Wang University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v38i1.27794

Keywords:

APP: Transportation, APP: Mobility, Driving & Flight, DMKM: Mining of Spatial, Temporal or Spatio-Temporal Data

Abstract

One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.

Published

2024-03-25

How to Cite

Zhang, H., Wang, G., Wang, X., Zhou, Z., Zhang, C., Dong, Z., & Wang, Y. (2024). NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching. Proceedings of the AAAI Conference on Artificial Intelligence, 38(1), 401-409. https://doi.org/10.1609/aaai.v38i1.27794

Issue

Section

AAAI Technical Track on Application Domains