Xu, Z., I. Gavran, Y. Ahmad, R. Majumdar, D. Neider, U. Topcu, and B. Wu. “Joint Inference of Reward Machines and Policies for Reinforcement Learning”. Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, no. 1, June 2020, pp. 590-8, doi:10.1609/icaps.v30i1.6756.