WEINSTEIN, A.; LITTMAN, M. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes. Proceedings of the International Conference on Automated Planning and Scheduling, [S. l.], v. 22, n. 1, p. 306-314, 2012. DOI: 10.1609/icaps.v22i1.13507. Disponível em: https://ojs.aaai.org/index.php/ICAPS/article/view/13507. Acesso em: 4 may. 2024.