Wang, B., Li, J., Chen, H., Chu, Y., Fan, Y., & Hu, X. (2026). Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 33359–33367. https://doi.org/10.1609/aaai.v40i39.40622