Fu, T., Xu, X., Xu, W., Chen, J., Ren, R., Deng, B., … Cao, X. (2026). Two Heads Are Better than One: Distilling Large Language Model Features into Small Models with Feature Decomposition and Mixture. Proceedings of the AAAI Conference on Artificial Intelligence, 40(23), 19082–19090. https://doi.org/10.1609/aaai.v40i23.38981