(1)
Vuong, H. T.; Le, T.; Tran, Q.; Van, L. N.; Le, T. MCW-KD: Multi-Cost Wasserstein Knowledge Distillation for Large Language Models. AAAI 2026, 40, 33332-33340.