[1]

R. Qiu, R. Wang, G. Yang, X. Li, and Z. Shao, “LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration”, AAAI, vol. 40, no. 30, pp. 25009–25017, Mar. 2026.