(1)
Schmitt, S.; Shawe-Taylor, J.; Hasselt, H. . . van. Chaining Value Functions for Off-Policy Learning. AAAI 2022, 36, 8187-8195.