Levy, D., and S. Ermon. “Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, Apr. 2018, doi:10.1609/aaai.v32i1.11822.