[1]

T. Morimura, T. Osogami, and T. Shirai, “Mixing-Time Regularized Policy Gradient”, AAAI, vol. 28, no. 1, Jun. 2014.