WEINSTEIN, Ari; LITTMAN, Michael. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes. Proceedings of the International Conference on Automated Planning and Scheduling, [S. l.], v. 22, n. 1, p. 306–314, 2012. DOI: 10.1609/icaps.v22i1.13507. Disponível em: https://ojs.aaai.org/index.php/ICAPS/article/view/13507. Acesso em: 26 may. 2026.