Zhang, Hongbo, Guang Wang, Xu Wang, Zhengyang Zhou, Chen Zhang, Zheng Dong, and Yang Wang. “NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (March 25, 2024): 401–409. Accessed May 13, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/27794.