Vuong, H. T., Le, T., Tran, Q., Van, L. N., & Le, T. (2026). MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33332–33340. https://doi.org/10.1609/aaai.v40i39.40619