[1]

N. Topin, S. Milani, F. Fang, and M. Veloso, “Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods”, AAAI, vol. 35, no. 11, pp. 9923-9931, May 2021.