Levy, Daniel, and Stefano Ermon. 2018. “Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces”. Proceedings of the AAAI Conference on Artificial Intelligence 32 (1). https://doi.org/10.1609/aaai.v32i1.11822.