Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Jianhao Chen; Zishuo Xun; Bocheng Zhou; Han Qi; Hangfan Zhang; Qiaosheng Zhang; Yang Chen; Wei Hu; Yuzhong Qu; Shuyue Hu

doi:10.1609/aaai.v40i24.39094

Authors

Jianhao Chen State Key Laboratory for Novel Software Technology, Nanjing University Shanghai Artificial Intelligence Laboratory
Zishuo Xun Shanghai Artificial Intelligence Laboratory University of Auckland
Bocheng Zhou Shanghai Artificial Intelligence Laboratory
Han Qi Shanghai Artificial Intelligence Laboratory
Hangfan Zhang The Pennsylvania State University
Qiaosheng Zhang Shanghai Artificial Intelligence Laboratory
Yang Chen Shanghai Artificial Intelligence Laboratory
Wei Hu State Key Laboratory for Novel Software Technology, Nanjing University
Yuzhong Qu State Key Laboratory for Novel Software Technology, Nanjing University
Shuyue Hu Shanghai Artificial Intelligence Laboratory

DOI:

https://doi.org/10.1609/aaai.v40i24.39094

Abstract

This paper presents a simple, effective, and cost-efficient strategy, named ModelSwitch, to improve LLM performance by scaling test-time compute. ModelSwitch builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths that potentially arise from diverse training data and paradigms. By using sample consistency as a signal, our strategy dynamically switches between models. Theoretical analysis highlights the efficiency and performance advantages of our strategy. Extensive experiments on seven datasets demonstrate that our strategy not only outperforms self-consistency and state-of-the-art multi-agent debate approaches, but also significantly reduces inference costs. Additionally, our strategy requires only a few comparable LLMs to achieve optimal performance and can be extended with verification methods, demonstrating the potential of leveraging multiple LLMs in the generation-verification paradigm.

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information