Vuong, Hoang Tran, Tue Le, Quyen Tran, Linh Ngo Van, and Trung Le. 2026. “MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models”. Proceedings of the AAAI Conference on Artificial Intelligence 40 (39):33332-40. https://doi.org/10.1609/aaai.v40i39.40619.