(1)

Zhang, C.; Li, Y.; Li, J. Policy Search by Target Distribution Learning for Continuous Control. AAAI 2020, 34, 6770-6777.