RCMoE: A Communication-Efficient Random Compression Framework for Resource-Constrained Mixture-of-Experts Training

Authors

  • Donglei Wu Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China
  • Xiao Cai Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China
  • Jinglei Tan Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China Information Engineering University, Zhengzhou, China
  • Jinda Jia Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
  • Guangming Tan Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
  • Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
  • Wen Xia School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
  • Zhihong Tian Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China

DOI:

https://doi.org/10.1609/aaai.v40i32.39899

Abstract

Mixture-of-Experts (MoE) architecture with experts parallelism scales LLMs efficiently by activating only a subset of experts per input, avoiding proportional training costs. However, the intensive and heterogeneous communication substantially hinders the efficiency and scalability of MoE training in the resource-constrained scenario. Existing communication compression techniques fall short in MoE training due to: (i) Intensive training amplifies compression overhead, compromising training efficiency; (ii) Accumulated compression errors propagate through the network, degrading training quality. In this paper, we propose RCMoE, a communication-efficient Random Compression framework for MoE training with two core modules: (1) Local-Stochastic Quantization compresses the all-to-all communication by stochastically quantizing each row of the expert's intermediate computing results in parallel, effectively improving the compression efficiency and reducing compression error; (2) Probabilistic Thresholding Sparsification compresses the all-reduce communication by probabilistically sampling large gradients at high probability, thereby reducing the computational complexity and maintaining the convergence efficiency. Experiments on four typical MoE training tasks prove that RCMoE achieves higher 5.9x-8.1x total communication compression ratios and 1.3x-10.1x training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training accuracy.

Published

2026-03-14

How to Cite

Wu, D., Cai, X., Tan, J., Jia, J., Tan, G., Tao, D., … Tian, Z. (2026). RCMoE: A Communication-Efficient Random Compression Framework for Resource-Constrained Mixture-of-Experts Training. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 26876–26884. https://doi.org/10.1609/aaai.v40i32.39899

Issue

Section

AAAI Technical Track on Machine Learning IX