NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching
DOI:
https://doi.org/10.1609/aaai.v38i1.27794Keywords:
APP: Transportation, APP: Mobility, Driving & Flight, DMKM: Mining of Spatial, Temporal or Spatio-Temporal DataAbstract
One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.Downloads
Published
2024-03-25
How to Cite
Zhang, H., Wang, G., Wang, X., Zhou, Z., Zhang, C., Dong, Z., & Wang, Y. (2024). NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching. Proceedings of the AAAI Conference on Artificial Intelligence, 38(1), 401-409. https://doi.org/10.1609/aaai.v38i1.27794
Issue
Section
AAAI Technical Track on Application Domains