[1]
G. Zhang, Y. Wang, X. Chen, H. Qian, K. Zhan, and B. Wang, “UNEX-RL: Reinforcing Long-Term Rewards in Multi-Stage Recommender Systems with UNidirectional EXecution”, AAAI, vol. 38, no. 8, pp. 9305-9313, Mar. 2024.