Zhang, H., Wang, G., Wang, X., Zhou, Z., Zhang, C., Dong, Z., & Wang, Y. (2024). NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching. Proceedings of the AAAI Conference on Artificial Intelligence, 38(1), 401–409. https://doi.org/10.1609/aaai.v38i1.27794