Xu, Zhe, Ivan Gavran, Yousef Ahmad, Rupak Majumdar, Daniel Neider, Ufuk Topcu, and Bo Wu. 2020. “Joint Inference of Reward Machines and Policies for Reinforcement Learning”. Proceedings of the International Conference on Automated Planning and Scheduling 30 (1):590-98. https://ojs.aaai.org/index.php/ICAPS/article/view/6756.