Ahmed, S., E. H. Bergou, Y. Wang, and A. Dutta. “Stabilizing Policy Gradient Methods via Reward Profiling”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 24, Mar. 2026, pp. 19560-8, doi:10.1609/aaai.v40i24.39035.