Levy, D., & Ermon, S. (2018). Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11822