[1]
S. Wang, “Scaling and Transferability of Annealing Strategies in Large Language Model Training”, AAAI, vol. 40, no. 40, pp. 33639–33647, Mar. 2026.