Fu T, Xu X, Xu W, Chen J, Ren R, Deng B, et al. Two Heads Are Better than One: Distilling Large Language Model Features into Small Models with Feature Decomposition and Mixture. AAAI [Internet]. 2026 Mar. 14 [cited 2026 May 30];40(23):19082-90. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/38981