Schmitt, Simon, John Shawe-Taylor, and Hado van Hasselt. 2022. “Chaining Value Functions for Off-Policy Learning”. Proceedings of the AAAI Conference on Artificial Intelligence 36 (8):8187-95. https://doi.org/10.1609/aaai.v36i8.20792.