Vuong, Hoang Tran, Tue Le, Quyen Tran, Linh Ngo Van, and Trung Le. “MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models”. Proceedings of the AAAI Conference on Artificial Intelligence 40, no. 39 (March 14, 2026): 33332–33340. Accessed May 16, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/40619.