[1]

T. Fu, “Two Heads Are Better than One: Distilling Large Language Model Features into Small Models with Feature Decomposition and Mixture”, AAAI, vol. 40, no. 23, pp. 19082–19090, Mar. 2026.