The Avengers: A Routing Recipe for Collective Intelligence in Language Models

Yiqun Zhang; Hao Li; Chenxu Wang; Linyao Chen; Qiaosheng Zhang; Peng Ye; Shi Feng; Xinrun Wang; Jia Xu; Lei Bai; Shuyue Hu

doi:10.1609/aaai.v40i41.40790

Authors

Yiqun Zhang Northeastern University, Shenyang, China Shanghai Artificial Intelligence Laboratory, Shanghai, China
Hao Li Shanghai Artificial Intelligence Laboratory, Shanghai, China Northwest Polytechnical University, Xi'an, China
Chenxu Wang Shanghai Artificial Intelligence Laboratory, Shanghai, China Beijing Institue of Technology, Beijing, China
Linyao Chen Shanghai Artificial Intelligence Laboratory, Shanghai, China The University of Tokyo, Tokyo, Japan
Qiaosheng Zhang Shanghai Artificial Intelligence Laboratory, Shanghai, China
Peng Ye Shanghai Artificial Intelligence Laboratory, Shanghai, China
Shi Feng Northeastern University, Shenyang, China
Xinrun Wang Singapore Management University, Singapore
Jia Xu Shanghai Artificial Intelligence Laboratory, Shanghai, China
Lei Bai Shanghai Artificial Intelligence Laboratory, Shanghai, China
Shuyue Hu Shanghai Artificial Intelligence Laboratory, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v40i41.40790

Abstract

Proprietary models are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers---a lightweight framework that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, data efficiency, and values of its sole parameter---the number of clusters.

The Avengers: A Routing Recipe for Collective Intelligence in Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information