Vuong, H. T. (2026) “MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models”, Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), pp. 33332–33340. doi: 10.1609/aaai.v40i39.40619.