De Asis, Kristopher, Alan Chan, Silviu Pitis, Richard Sutton, and Daniel Graves. 2020. “Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning”. Proceedings of the AAAI Conference on Artificial Intelligence 34 (04):3741-48. https://doi.org/10.1609/aaai.v34i04.5784.