Qiu, R., Wang, R., Yang, G., Li, X., & Shao, Z. (2026). LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration. Proceedings of the AAAI Conference on Artificial Intelligence, 40(30), 25009–25017. https://doi.org/10.1609/aaai.v40i30.39689