Wang, Siqi, et al. “Scaling and Transferability of Annealing Strategies in Large Language Model Training”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 40, Mar. 2026, pp. 33639-47, doi:10.1609/aaai.v40i40.40653.