Le, Hung, et al. “Episodic Policy Gradient Training”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, June 2022, pp. 7317-25, doi:10.1609/aaai.v36i7.20694.