Yue, C., Dong, C., Gao, Y., He, H., Chai, J., Lin, W., & Yin, G. (2026). Promoting Efficient Reasoning with Verifiable Stepwise Reward. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34530–34538. https://doi.org/10.1609/aaai.v40i41.40752