[1]
D. Levy and S. Ermon, “Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces”, AAAI, vol. 32, no. 1, Apr. 2018.