Levy, D. and Ermon, S. (2018) “Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces”, Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). doi: 10.1609/aaai.v32i1.11822.