Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts

Qi Wang; Hanyang Peng; Yue Yu

doi:10.1609/aaai.v40i31.39847

Authors

Qi Wang Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences Peng Cheng Laboratory University of Chinese Academy of Sciences State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences
Hanyang Peng PengCheng Laboratory
Yue Yu PengCheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i31.39847

Abstract

Mixture-of-Experts (MoE) models enable scalable performance by activating large parameter sets sparsely, minimizing computational overhead. To mitigate the prohibitive cost of training MoEs from scratch, recent work employs upcycling, reusing a single pre-trained dense model by replicating its feed-forward network (FFN) layers into experts. However, this limits expert diversity, as all experts originate from a single pre-trained dense model. This paper addresses this limitation by constructing powerful MoE models using experts sourced from multiple identically-architected but disparate pre-trained models (e.g., Qwen2.5-Coder and Qwen2). A key challenge lies in the fact that these source models occupy disparate, dissonant regions of the parameter space, making direct upcycling prone to severe performance degradation. To overcome this, we propose Symphony-MoE, a novel two-stage framework designed to harmonize these models into a single, coherent expert mixture. First, we establish this harmony in a training-free manner: we construct a shared backbone via a layer-aware fusion strategy and, crucially, alleviate parameter misalignment among experts using activation-based functional alignment. Subsequently, a stage of post-training coordinates the entire architecture. Experiments demonstrate that our method successfully integrates experts from heterogeneous sources, achieving an MoE model that significantly surpasses baselines in multi-domain tasks and out-of-distribution generalization.

Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information