[1]

Z. Cao, H. Guo, J. Zhang, F. Oliehoek, and U. Fastenrath, “Maximizing the Probability of Arriving on Time: A Practical Q-Learning Method”, AAAI, vol. 31, no. 1, Feb. 2017.