Wang, X., Song, Y., Tian, Z., Liu, B., Luo, T., & Huang, M. (2026). DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 33683–33691. https://doi.org/10.1609/aaai.v40i40.40658