[1]

H. T. Vuong, T. Le, Q. Tran, L. N. Van, and T. Le, “MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models”, AAAI, vol. 40, no. 39, pp. 33332–33340, Mar. 2026.