XU, Zhe; GAVRAN, Ivan; AHMAD, Yousef; MAJUMDAR, Rupak; NEIDER, Daniel; TOPCU, Ufuk; WU, Bo. Joint Inference of Reward Machines and Policies for Reinforcement Learning. Proceedings of the International Conference on Automated Planning and Scheduling, [S. l.], v. 30, n. 1, p. 590–598, 2020. DOI: 10.1609/icaps.v30i1.6756. Disponível em: https://ojs.aaai.org/index.php/ICAPS/article/view/6756. Acesso em: 28 may. 2026.