[1]
C. Wang, H. He, L. Zhao, X. Deng, L. Duan, and S. Wan, “CasMoE: A Cascaded Framework for Efficient MoE Inference on Resource-constrained Devices”, AAAI, vol. 40, no. 31, pp. 26133–26141, Mar. 2026.