Wang, Xinyi, et al. “DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 40, Mar. 2026, pp. 33683-91, doi:10.1609/aaai.v40i40.40658.