RCMoE: A Communication-Efficient Random Compression Framework for Resource-Constrained Mixture-of-Experts Training

Donglei Wu; Xiao Cai; Jinglei Tan; Jinda Jia; Guangming Tan; Dingwen Tao; Wen Xia; Zhihong Tian

doi:10.1609/aaai.v40i32.39899

Authors

Donglei Wu Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China
Xiao Cai Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China
Jinglei Tan Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China Information Engineering University, Zhengzhou, China
Jinda Jia Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Guangming Tan Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Wen Xia School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
Zhihong Tian Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou, China Guangdong Key Laboratory of Industrial Control System Security, Guangzhou, China Huangpu Research School of Guangzhou University, Guangzhou, China

DOI:

https://doi.org/10.1609/aaai.v40i32.39899

Abstract

Mixture-of-Experts (MoE) architecture with experts parallelism scales LLMs efficiently by activating only a subset of experts per input, avoiding proportional training costs. However, the intensive and heterogeneous communication substantially hinders the efficiency and scalability of MoE training in the resource-constrained scenario. Existing communication compression techniques fall short in MoE training due to: (i) Intensive training amplifies compression overhead, compromising training efficiency; (ii) Accumulated compression errors propagate through the network, degrading training quality. In this paper, we propose RCMoE, a communication-efficient Random Compression framework for MoE training with two core modules: (1) Local-Stochastic Quantization compresses the all-to-all communication by stochastically quantizing each row of the expert's intermediate computing results in parallel, effectively improving the compression efficiency and reducing compression error; (2) Probabilistic Thresholding Sparsification compresses the all-reduce communication by probabilistically sampling large gradients at high probability, thereby reducing the computational complexity and maintaining the convergence efficiency. Experiments on four typical MoE training tasks prove that RCMoE achieves higher 5.9x-8.1x total communication compression ratios and 1.3x-10.1x training speedup compared with the state-of-the-art compression techniques while maintaining the MoE training accuracy.

RCMoE: A Communication-Efficient Random Compression Framework for Resource-Constrained Mixture-of-Experts Training

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information