Bian, Y., J. Feng, and Y. Shi. “DiffOP: Reinforcement Learning of Optimization-Based Control Policies via Implicit Policy Gradients”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 24, Mar. 2026, pp. 19737-45, doi:10.1609/aaai.v40i24.39055.