[1]

S. Xu, “RLKD: Distilling LLMs’ Reasoning via Reinforcement Learning”, AAAI, vol. 40, no. 40, pp. 34151–34159, Mar. 2026.