[1]
Wang, C. et al. 2026. CasMoE: A Cascaded Framework for Efficient MoE Inference on Resource-constrained Devices. Proceedings of the AAAI Conference on Artificial Intelligence. 40, 31 (Mar. 2026), 26133–26141. DOI:https://doi.org/10.1609/aaai.v40i31.39816.